|
META TOPICPARENT |
name="DaveMorris" |
VO data types
This review of the data types defined in the VO specifications was initially done for my own benefit,
to help me understand how the different methods for describing data types in the VO fitted together.
The review looks specifically at the relationships between types, attributes and columns,
particularly those with similar names in different standards and how they relate to each other.
This review is based on the following specifications:
Still to do are:
VODataService
The VODataService specification defines an XML schema for describing data collections and the services that access them.
The data types defined in VODataService are intended to be used to describe the data in VO data sets and the services and protocols used to access them.
DataType
Section 3.5 (Data Parameters)
of the VODataService specification
defines the DataType XML element.
DataType includes the following attributes:
DataType =arraysize
Section 3.5 (Data Parameters)
of the VODataService specification
defines the arraysize attribute.
The specification text describes the arraysize attribute as follows:
- "The arraysize attribute indicates the parameter is an array of values of the named type."
- "Its value describes the shape of the array, and the delim attribute may be used to indicate the delimiter that should appear between elements of an array value."
- "The attribute's presence indicates that parameter holds an array values; the attribute's value indicates the length of the array along each dimension of the multi-dimensional array."
ArrayShape
Section 3.5 (Data Parameters)
of the VODataService specification
defines the ArrayShape restriction,
which sets the syntax for the arraysize attribute.
The specification text describes the ArrayShape as follows:
- "the VOTable arraysize format (vs:ArrayShape): LxMxN..., where each x-delimited positive integer is a length along a dimension of a multi-dimensional array. A single integer indicates a one dimensional array. Instead of an integer, the last length can be set to "*" which indicates a variable length."
Note - The reference to "VOTable arraysize format (vs:ArrayShape)" should probably be "vs:ArrayShape format ".
The XML schema defines the ArrayShape string syntax as follows:
<!--
- this definition is taken from the VOTable arrayDEF type
-->
<xs:simpleType name="ArrayShape">
<xs:annotation>
<xs:documentation>
An expression of a the shape of a multi-dimensional array
of the form LxNxM... where each value between gives the
integer length of the array along a dimension. An
asterisk (*) as the last dimension of the shape indicates
that the length of the last axis is variable or
undetermined.
</xs:documentation>
</xs:annotation>
<xs:restriction base="xs:token">
<xs:pattern value="([0-9]+x)*[0-9]*[*]?"/>
</xs:restriction>
</xs:simpleType>
As the comment in the XML schema suggests, the ArrayShape string syntax
is similar to, but not explicitly related to, the arrayDEF string format
defined in the VOTable specification.
The ArrayShape string syntax is used in several places in the
VODataService XML schema to define the content of
arraysize attributes on elements derived from
DataType, including VOTableType and
TAPType.
The ArrayShape string syntax is not used directly in any of the other VO specifications.
DataType =delim
Section 3.5 (Data Parameters)
of the VODataService specification
defines the delim attribute.
The specification text describes the delim attribute as follows:
- "the string that is used to delimit element of an array value when arraysize is not "1""
The XML schema defines a default value the delim attribute as a single white space.
<xs:attribute name="delim" type="xs:string" default=" ">
The specification text and comments in the XML schema encourages applications to allow optional
spaces before and after the delimiter (e.g. "1, 5" when delim=",").
However, this is not explicitly encoded in the XML schema itself.
The delim attribute is not referred to by any of the other VO specifications.
All the other VO specifications use white space as the delimiter,
either explicitly defined in the specification text, or by implication
in the examples.
The definitions for arrays and complex numbers in the VOTable specification explicitly declares white space as the delimiter.
- The VOTable
TABLEDATA serialization for arrays of numeric values explicity uses white space as the delimiter.
- The VOTable
TABLEDATA serialization for floatComplex and doubleComplex explicity uses white space as the delimiter.
Although none of the data types defined in the DALI specifcation explicitly declare a delimiter,
all of the examples in the text use white space.
- The example for Interval uses white space as the delimiter.
- The example for Point uses white space as the delimiter.
- The example for Circle uses white space as the delimiter.
- The example for Polygon uses white space as the delimiter.
DataType =extendedType
Section 3.5 (Data Parameters)
of the VODataService specification
defines the extendedType attribute.
The specification text describes the extendedType attribute as follows:
- "The data value represented by this type can be interpreted as of a custom type identified by the value of this attribute. "
- "The name implies a particular expected format for the data value that can be parsed into a value in memory."
- " If an application does not recognize this extendedType, it should attempt to handle value assuming the type given by the element's value. "string" (or its equivalent) is a recommended default type."
- " This element may make use of the extendedSchema attribute and/or any arbitrary (qualified) attribute to refine the identification of the type. "
Looking at the body of standards as a whole, seems to suggest tha that the extendedType attribute
is functionally equivalent to the xtype attribute defined in the VOTable specification.
However, as far as we can tell, this is not explicitly stated anywhere, and there in no mapping defined between the
(extendedType | extendedSchema ) attribute pair defined in VODataService
and the (xtype with prefix) attribute defined in the VOTable specification.
The VODataService specification does not provide an example of how the extendedType attribute could be used.
The extendedType attribute is not referred to in any of the other VO specifications.
DataType =extendedSchema
Section 3.5 (Data Parameters)
of the VODataService specification
defines the extendedType attribute.
The specification text describes the extendedType attribute as follows:
- "An identifier for the schema that the value given by the extended attribute is drawn from."
The specification does not provide an example of how the extendedSchema attribute would be used.
The extendedSchema attribute is not used in the VODataService specification.
The extendedSchema attribute is not used in any of the other VO specifications.
TableDataType
Section 3.5.3 (Table Column Data Types)
of the VODataService specification
defines the TableDataType XML element.
TableDataType extends the DataType element.
The XML schema describes TableDataType as:
- "an abstract parent for a class of data types that can be used to specify the data type of a table column."
VOTableType
Section 3.5.3 (Table Column Data Types)
of the VODataService specification
defines the VOTableType XML element
The VOTableType XML element extends the DataType element.
The VOTableType XML element inherits the following attributes from DataType:
VOTableType defines the following set of allowed values:
-
boolean
-
bit
-
unsignedByte
-
short
-
int
-
long
-
char
-
unicodeChar
-
float
-
double
-
floatComplex
-
doubleComplex
The specification text describes VOTableType as follows :
- "data types that correspond to the parameter and column types defined in the VOTable schema"
The XML schema comments describe VOTableType as follows :
- "a data type supported explicitly by the VOTable format".
The definition of VOTableType does not provide any further details about the sizes,
ranges or content of the data types.
It is left to the reader to refer to the VOTable specification for details about the data types.
The definition of VOTableType states that string values of arbitrary length are
represented by a data type of char with arraysize="*" .
In order to support strings with unicode characters it may be clearer to explicitly state ASCII
strings should be represented by a data type of char with arraysize="*" and Unicode strings
should be represented by a data type of unicodeChar and arraysize="*" .
Note - the bibliography reference to the VOTable specification explicitly refers to
version 1.2 (20091130) of
the specification, which has since been superceded by
version 1.3 (20130920).
TAPDataType
The TAPDataType element is not explicitly described in the text of the VODataService specification.
The VODataService XML schema describes TAPDataType as follows:.
- "an abstract parent for the specific data types supported by the Table Access Protocol"
The XML schema for the TAPDataType element defines the following attribute:
Note - the TAPDataType element name reflects the historical situation where the data types were originally defined in the TAP specification. The data type definitions have since been moved to the ADQL specification, but for backward compatibility the XML element name has not been changed.
TAPType
Section 3.5.3 (Table Column Data Types)
of the VODataService specification
defines the TAPType XML element.
The TAPType element inherits the following attributes from DataType:
The TAPType inherits the following attribute from TAPDataType:
-
size the length of a variable-length value
TAPType defines the following set of allowed values:
-
BOOLEAN
-
SMALLINT
-
INTEGER
-
BIGINT
-
REAL
-
DOUBLE
-
TIMESTAMP
-
CHAR
-
VARCHAR
-
BINARY
-
VARBINARY
-
POINT
-
REGION
-
CLOB
-
BLOB
The specification text describes TAPType as follows :
- "data types that correspond column types defined in the Table Access Protocol (v1.0) [TAP]"
The explicit reference to version 1.0 of the TAP specification is out of date.
The TAP specification no longer contains the definition of these data types.
The TAPType element name reflects the historical situation where the data types were originally
defined in the TAP specification. The data type definitions have since been moved
to the ADQL specification, but for compatibility reasons, the XML element name has not been changed.
The definition of TAPType does not provide any further details about the sizes,
ranges or content of the data types.
It is left to the reader to refer to the TAP
(now ADQL) specification for details about the data types.
The text at the end the section refers to a mapping between TAP_SCHEMA types
and VOTable types in the TAP specification.
- "Note that the TAP standard [TAP] defines an explicit mapping between TAP_SCHEMA types and VOTable types."
This mapping is no longer part of the TAP specification.
The definition of TAPType states that string values should be represented
by a data type of VARCHAR , the text does not say whether this should use a
size or an arraysize attribute.
Note - the TAPType element name reflects the historical situation where
the data types were originally defined in the TAP specification.
The data type definitions have since been moved to the ADQL specification,
but for backward compatibility the XML element name has not been changed.
TAPType =size
Section 3.5.3 (Table Column Data Types)
of the VODataService specification
defines the size attribute.
Technically, size is an attribute of the abstract TAPDataType parent element,
which is then inherited by the TAPType element.
The text describing for the TAPType element describes the size attribute as follows:
- "The length of the variable-length data type."
- "In the context of TAP, this attribute is only meaning when the data type is CHAR or BINARY; see discussion below."
This restriction implies that CHAR and BINARY values are not arrays of values and have an inherent 'size' property, which is distinct from the 'arraysize' property.
In the discussion that follows, the VODataService specification cites the following two examples as equivalent:
<dataType xsi:type="vs:VOTableType" arraysize="*"> char </dataType>
<dataType xsi:type="vs:TAPType"> VARCHAR </dataType>
A third example describes a fixed length string, using the size rather than the arraysize attribute
<dataType xsi:type="vs:TAPType" size="8" > CHAR </dataType>
However, the VODataService specification does not explicitly explain the difference between
the folllowing examples:
<dataType xsi:type="vs:TAPType" size="8" > CHAR </dataType>
<dataType xsi:type="vs:TAPType" arraysize="8" > CHAR </dataType>
The comments in the XML schema for TAPDataType describes the size attribute as follows:
- "This corresponds to the size Column attribute in the TAP_SCHEMA and can be used with data types that are defined with a length (CHAR, BINARY)."
This establishes a link between the TAPDataType size attribute and the
size column in TAP_SCHEMA .
In the currenet version of the TAP specification the corresponding size column is described as :
- "retained for backwards compatibility to TAP-1.0"
The original text in version 1.0 of the TAP specification describes the size column as follows :
- "The “size” gives the length of variable length datatypes, for example varchar(256);"
The TAP specification does not link the size column back to TAPDataType in the VODataService specification.
VODataService Table
... TBD
VODataService TableSet
... TBD
VOTable
The VOTable specification defines an XML based serialization format for exchanging tabular data in the VO.
VOTable data types
Section 2.1 (Primitives)
of the VOTable specification
defines a core set of primitive data types.
The following table descibes the types, their semantic meaning, the corresponding FITS data type and the size in bytes:
Datatype |
Meaning |
FITS |
Bytes |
boolean |
Logical |
L |
1 |
bit |
Bit |
X |
* |
unsignedByte |
Byte (0 to 255) |
B |
1 |
short |
Short Integer |
I |
2 |
int |
Integer |
J |
4 |
long |
Long integer |
K |
8 |
char |
ASCII Character |
A |
1 |
unicodeChar |
Unicode Character |
|
2 |
float |
Floating point |
E |
4 |
double |
Double |
D |
8 |
floatComplex |
Float Complex |
C |
8 |
doubleComplex |
Double Complex |
M |
16 |
VOTable serialization
Section 6 (Definitions of Primitive Datatypes)
of the VOTable specification describes the BINARY , BINARY2 and TABLEDATA serializations of each of the primative data types.
VOTable =boolean
Case insensitive long form:
- TRUE FALSE
- True False
- true false
Case insensitive short form:
Numeric, one or zero:
This results in a slightly different definition of how to serialize boolean values
compared to the DALI definition, which uses a defintion from the
W3C XML schema specification.
VOTable =bit
Array of bits, padded to fit into bytes.
VOTable =unsignedByte
8 bit (unsigned) integers, 0 to 255.
VOTable =short
16 bit signed integers, -32768 to 32767.
VOTable =int
32 bit signed integers, -2147483648 to 2147483647.
VOTable =long
64 bit signed integers, -9223372036854775808 to 9223372036854775807.
VOTable =float
ANSI/IEEE-754 32-bit floating point numbers.
VOTable =double
ANSI/IEEE-754 64-bit double precision floating point numbers.
VOTable =char
ASCII (7-bit) characters.
VOTable =unicodeChar
The description for the BINARY serialization of unicodeChar defines it as Unicode (UCS-2) fixed width 2-byte characters.
- "Each Unicode character is represented in the BINARY/BINARY2 serialization by two bytes, using the big-endian UCS-2 encoding (ISO-10646-UCS-2)"
The UCS-2 character set includes all of the characters in the Basic Multilingual Plane (BMP),
which contains characters for almost all modern languages.
The description for the TABLEDATA serialization includes an example showing how a unicodeChar that is outside the ASCII character
set can be represented in an XML document by using a numeric character reference (NCR).
- "The representation of a Unicode character in the
TABLEDATA serialization follows the XML specifications, and e.g. the Cyrillic uppercase ``Ya'' can be written Я in UTF-8."
The reference to UTF-8 in this description may be misleading,
because a UTF-8 document can contain the multi-byte Cyrillic uppercase ``Ya'' character, Я, as-is, without
needing to use a numeric character reference.
The reason for using numeric character references is if the document character set is not able to represent the ``Ya'' character, Я.
As a result, declaring a UTF-8 encoding for a VOTable document containing TABLEDATA data may be problematic,
<?xml version=“1.0” encoding=“utf-8”?>
as this would mean the VOTable document would be able to contain complex multibyte characters
that are beyond the range of the UCS-2 fixed-width character set.
It may be better to specify the character encoding for VOTable documents as UCS-2 ,
<?xml version=“1.0” encoding=“ucs-2”?>
This would make the the TABLEDATA serialization equivalent to the BINARY serialization,
and require numeric character references for all characters outside the UCS-2 two byte
fixed size.
Note - There is a paragraph needed to link this section and the section describing the different MIME types
and how they would effect the serialization of unicodeChar strings.
Note - since 2005 it is no longer possible to encode all of the mandatory components defined in the official
GB 18030-2005 character set of the People's Republic of China
in a fixed width 2 byte character set. Support for the GB 18030-2005 character set is officially
required for all software products sold in the PRC.
VOTable =floatComplex
The description for the BINARY serialization of floatComplex defines it as a pair of 32-bit, single precision, floating point numbers.
- "a sequence of pairs of 32-bit single precision floating point numbers in big-endian order"
The description for the TABLEDATA serialization of floatComplex defines it as a pair of floating point numbers separated by white space.
- "two representations of a Single Precision Floating Point numbers separated by whitespace, representing the real and imaginary part respectively"
Note that this effectively fixes the delimter for the TABLEDATA serialization to white space, regardless of the delim attribute
set in the VODataService description of the source data table.
VOTable =doubleComplex
The description for the BINARY serialization of doubleComplex defines it as a pair of 64-bit, double precision, floating point numbers.
- "a sequence of pairs of 64-bit double precision floating point numbers in big-endian order"
The description for the TABLEDATA serialization of floatComplex defines it as a pair of floating point numbers separated by white space.
- "two representations of a Double Precision Floating Point numbers separated by whitespace, representing the real and imaginary part respectively"
Note that this effectively fixes the delimter for the TABLEDATA serialization to white space, regardless of the delim attribute
set in the VODataService description of the source data table.
VOTable =xtype
Section 4.3 (Extended Datatype)) of the
VOTable specification describes the xtype attribute as bridging the gap between
the FITS based primitive VOTableTypes and the data types
used to express TAP ADQL
database queries and their results.
The VOTable specification does not define a definitive list of standard xtype values.
Section 3.3 (Literal Values)
of the DALI specification suggests that services should use a prefix for non-standard xtype values.
The DALI specification does define a number of types, including POINT, CIRCLE and POLYGON. However it does not declare a specific list of standard xtype values which do not need prefixes.
The VOTable specification does not explicitly state that the
VOTable xtype attribute
is related to the
TAP_SCHEMA xtype column
used in TAP_SCHEMA metadata tables,
or the
VODataService extendedType attribute
that is used in the
TAP /tables
VOSI response.
VOTable =timestamp
The VOTable specification cites an example of using the xtype attribute to describe a timestamp value.
- _a UTC date/time string following the ISO-8601 standard (YYYY-MM-DDThh:mm:ss followed by a decimal point and fractions of seconds)"
The VOTable specification does not link to the DALI specification,
which has a more detailed description of how timestamp values should be
represented using xtype .
VOTable arrays
Section 2.2 of the
VOTable specification uses a number of examples to show how a
combination of datatype and arraysize attributes can be used to describe
arrays of values in the metadata for a FIELD.
Section 5.1 of the
VOTable specification describes the TABLEDATA serialization of arrays as follows:
- "If a cell contains an array of numbers or a complex number, it should be encoded as multiple numbers separated by whitespace. However in the case of character and Unicode strings (declared in the corresponding FIELD as an array of char or unicodeChar datatype), no separator should exist."
It uses the following example to illustrate the difference between arrays of numbers and arrays of characters:
<TABLE>
<FIELD name="aString" datatype="char" arraysize="10"/>
<FIELD name="aShort" datatype="short"/>
<FIELD name="varInts" datatype="int" arraysize="*"/>
<FIELD name="Floats" datatype="float"arraysize="3"/>
<DATA><TABLEDATA>
<TR> <TD>Apple</TD> <TD/> <TD>1 2 4 8 16</TD> <TD>1.62 4.56 3.44</TD> </TR>
<TR> <TD>Orange</TD> <TD>15</TD> <TD>23 -11 9</TD> <TD>2.33 4.66 9.53</TD> </TR>
</TABLEDATA></DATA>
</TABLE>
VOTable =delim
The VOTable specification does not include anything to describe the delimiter for arrays of values.
VOTable Field
... TBD
VOTable =arraysize
The text of the VOTable specification does not explicitly define the
arraysize attribute.
The XML specification for the arraysize attribute
does not apply a restriction to the content of the attribute.
<xs:complexType name="Field">
....
<xs:attribute name="arraysize" type="xs:string"/>
....
</xs:complexType>
The VOTable specification does not link the VOTable
arraysize attribute with the DataType
arraysize attribute defined in the VODataService
specification.
VOTable =arrayDEF
The VOTable XML schema defines the arrayDEF syntax restriction as follows:
<xs:simpleType name="arrayDEF">
<xs:restriction base="xs:token">
<xs:pattern value="([0-9]+x)*[0-9]*[*]?(s\W)?"/>
</xs:restriction>
</xs:simpleType>
However, the arrayDEF syntax restriction is not used in the definition
of the arraysize attribute:
<xs:complexType name="Field">
....
<xs:attribute name="arraysize" type="xs:string"/>
....
</xs:complexType>
This means that the content of the VOTable arraysize
attribute is unrestricted, and may contain any string.
In contrast, the VODataService does restrict the
content of the of the DataType arraysize
attrribute, with the ArrayShape restriction.
This means it is possible to create a value for arraysize
that is valid in VOTable but is not valid in
the VOSI /tables
response, which uses the ArrayShape syntax restriction
defined in the VODataService schema.
The only reference to the arrayDEF syntax restriction
in the other VO specifications is a comment in the definition of the
ArrayShape in the VODataService schema.
The text of the VOTable specification does not link the
arrayDEF string syntax with the ArrayShape string
syntax defined in the VODataService schema.
The arrayDEF string syntax is not used anywhere else in VOTable XML schema.
The arrayDEF string syntax is not used in any of the other VO specifications.
Arrays of Variable-Length Strings
Appendix A.3 (Arrays of Variable-Length Strings)
refers to the Substring Array convention, described in an appendix of the FITS specification.
The text in this specification suggests that a similar convention could be used
in VOTable.
- "A convention similar to the FITS one could be introduced in VOTable in the arraysize attribute ..."
However, the text does not go beyond suggesting this as a possibility, and does not
declare whether this is reccomended practice or not.
Provision for this extension is included in the regular expression
for the arrayDEF syntax restiction.
<xs:simpleType name="arrayDEF">
<xs:restriction base="xs:token">
<xs:pattern value="([0-9]+x)*[0-9]*[*]?(s\W)?"/>
</xs:restriction>
</xs:simpleType>
However, because VOTable arrayDEF and
VODataService ArrayShape
are not directly linked, adding support for this to VOTable
makes it is possible to create a value for arraysize
that is valid in VOTable but is not valid in
the VOSI /tables
response, which uses the ArrayShape syntax restriction
defined in the VODataService schema.
MIME type
Section 8 (MIME type)
describes the format of MIME types that can be used to describe a VOTable document.
The text includes the following set of examples:
-
text/xml
-
text/xml; charset="iso-8859-1"
-
application/x-votable+xml
-
application/x-votable+xml; serialization=tabledata
-
application/x-votable+xml; serialization=TABLEDATA; charset=iso-8859-1
The text in this section states that if the optional charset parameter is not supplied,
then default of US-ASCII is assumed.
This may conflict with the discussion of the unicodeChar data type,
which assumes that the default charset is UTF-8 .
The specification does not go into details about how changing the MIME type charset
could change the way that strings of unicodeChar are serialized.
DALI
The DALI specification defines the base web service interface common to all Data Access Layer (DAL) services.
DALI tables
Section 2.6 (VOSI-tables)
of the DALI specification refers to the VOSI-tables web resource,
defined by the Grid and Web Wervices working group.
DALI types
Section 3.3 (Literal values)
of the DALI specification specifies how a number of data types should be expressed
by DALI services.
Note that although the following data types and the rules for representing them may be used for literal values
in parameters passed to DALI services,
these definitions are also referred to by other VO specifications to describe how to represent data values in
VOTable results, VOSI /tables responses
and TAP_SCHEMA metadata tables.
This may mean that 'Literal values' might not be the most appropriate title for this section.
DALI Numbers
Section 3.3.1 (Numbers)
refers to the VOTable specification.
- "Integer and real numbers must be represented in a manner consistent with the specification for numbers in VOTable"
However, it is not clear which VOTable serializations, BINARY , BINARY2 or TABLEDATA
apply in which situations.
DALI Boolean
Section 3.3.2 (Boolean)
refers to part 2 of the W3C XML schema specification.
This results in a slightly different definition of how boolean values should be expressed compared to the
definition given in the VOTable specification.
DALI Timestamp
Section 3.3.3 (Timestamp)
defines three representations for date and time values as follows:
- For astronomical values the time and date format follows the convention established for FITS, mandating UTC, but omitting the timzone in the representation,
YYYY-MM-DD['T'hh:mm:ss[.SSS]] .
- For civil values, relating to events at locations on the Earth, the format includes an optional 'Z' to to explicitly specify the UTC time zone,
YYYY-MM-DD['T'hh:mm:ss[.SSS]['Z']] .
- Julian Date (JD) or Modified Julian Date (MJD) values should follow the rules for double precision numbers.
The specification also describes how to represent timestamp
values in the metadata for a VOTable FIELD using timestamp for
the [#VoTableXtype][xtype] attrbute.
<FIELD datatype="char", arraysize="*", xtype="timestamp">
Note - the specification text includes the following:
- "Julian Date (JD) or Modified Julian Date (MJD), these follow the rules for double precision numbers above"
However, apart from the reference to the VOTable in the section on numeric values
there are no explicit rules for double precision numbers in the preceeding text.
DALI Interval
Section 3.3.4 (Interval)
defines how to represent numeric intervals as pairs of numeric values.
The specification describes how to represent numeric intervals
in the metadata for a VOTable FIELD,
by setting arraysize to 2 and using interval for the xtype attrbute.
<FIELD datatype="short", arraysize="2", xtype="interval">
<FIELD datatype="int", arraysize="2", xtype="interval">
<FIELD datatype="long", arraysize="2", xtype="interval">
....
<FIELD datatype="float", arraysize="2", xtype="interval">
<FIELD datatype="double", arraysize="2", xtype="interval">
All of the examples shown in the specification text use space as the delimiter.
However, the specification does not explicity define a delimiter,
nor does it refer to the delim attribute
defined in VODataService.
DALI Time Interval (proposed)
PROPOSED
The version (1.1)
of the DALI specification
does not describe how to represent time intervals.
Given that we have already used timestamp and interval to cover the separate cases,
we will need to define a new xtype value to describe an interval
of timestamps.
We propose timestamp-interval as the
xtype value to represent intervals of timestamps.
Given that timestamp is serialized as an array of characters,
and given the way that the geometric types, Point,
Circle, and Polygon are represented.
Following a similar patten, an Interval of
Timestamps should probably be reprsented as a space delimited
sequence of two Timestamps.
<FIELD datatype="char", arraysize="*", xtype="timestamp-interval">
For example:
1970-01-01T00:00:00.000Z 2017-08-16T17:12:54.621Z
DALI Point
Section 3.3.5 (Point)
of the DALI specification
defines how to represent a geometric point as an array of two floating point numbers.
<FIELD ... datatype="float" arraysize="2" xtype="point">
....
<FIELD ... datatype="double" arraysize="2" xtype="point">
The specification text states that the usual representation is to use longitude and latitude values in spherical coordinates.
The text also implies that other coordinate systems can be used:
- "although they are usually longitude and latitude values in spherical coordinates this is specified in the coordinate metadata and not in the values"
However, the text does not show how to a specifiy a different coordinate system.
The example shown in the specification text uses space as the delimiter:
12.3 45.6
The specification does not explicity define a delimiter,
nor does it refer to the delim attribute
defined in VODataService.
DALI Circle
Section 3.3.6 (Circle)
of the DALI specification
defines how to represent a circle as an array of three floating point numbers.
<FIELD ... datatype="float" arraysize="3" xtype="circle">
...
<FIELD ... datatype="double" arraysize="3" xtype="circle">
The specification text implies that the usual representation is to use longitude, latitude and radius in spherical coordinates.
However, the text does not show how to a specifiy a different coordinate system.
The example shown in the specification text uses space as the delimiter:
12.3 45.6 0.5
The specification does not explicity define a delimiter,
nor does it refer to the delim attribute
defined in VODataService.
Note - the text in the current version (1.1) repeats the same range restriction twice.
- "For circles in a spherical coordinate system ... longitude values must fall within [0,360], latitude values within [-90,90], and radius values in (0,180]"
- "In spherical coordinates, all longitude values must fall within [0,360] and all latitude values within [-90,90]"
DALI Polygon
Section 3.3.7 (Polygon)
of the DALI specification
defines how to represent a polygon as an array of floating point numbers.
<FIELD ... datatype="float" arraysize="*" xtype="polygon">
....
<FIELD ... datatype="double" arraysize="*" xtype="polygon">
Note that the polygon is modelled as a one-dimensional array of numbers,
rather than a two-dimensional array of pairs of numbers.
<FIELD ... datatype="float" arraysize="2,*" xtype="polygon">
The text of the specification implies that the usual representation is to use spherical coordinates.
However, the text does not show how to a specifiy a different coordinate system.
The example shown in the specification text uses space as the delimiter:
10.0 10.0 10.2 10.0 10.2 10.2 10.0 10.2
However, the specification does not explicity define a delimiter,
nor does it refer to the delim attribute
defined in VODataService.
DALI RESPONSEFORMAT
Section 3.4.3 (RESPONSEFORMAT) of the
DALI specification
describes the RESPONSEFORMAT parameter that enables a client to request different
MIME types and response formats from a DALI service.
... TDB
Note - need to cross reference this with charset from the VOTable MIME types.
VOSI
The VOSI specification defines a number of
web service interface methods and resources that are
common to all of the VO services.
VOSI =/tables
Section 3.3 (Table metadata)
of the VOSI specification describes a resource that describes the content of database tables
accessible from a VO service.
The VOSI Tables XML schema imports the Table and TableSet
elements from the VODataService specification.
The VOSI Tables XML schema inherits both the VOTableType and TAPType
elements from the VODataService specification, along with their attributes.
The VOTableType and TAPType elements inherit the following attributes
from DataType:
The TAPType element inherits the following attribute
from #TAPDataType:
Note - the examples given in the text of the VOSI specification
use vs:TAP rather than vs:TAPType for the xsi:type attribute.
<column>
<name>cfhtlsID </name>
<dataType xsi:type="vs:TAP" size="30">adql:VARCHAR</dataType>
</column>
Note - the examples given in the text of the VOSI specification use
a prefix for the adql:VARCHAR data type, which is not a valid value for
a TAPType data type.
TAP
The TAP specification describes the Table Access Protocol web service.
TAP =/tables
Section 2.5 (/tables)
of the TAP specification
describes the /tables resource.
The text in the current version (WD-TAP-1.1-20170707)
of the specification refers to the VODataService specification
for details of the /tables resource content.
However, the content, and more importantly, the behaviour, of the
VOSI /tables resource
is described in
Section 3.3 (Table metadata)
of the VOSI specification.
Which does then import the Table and TableSet
XML elements from the VODataService specification.
The text of the specification recommends using VOTableType rather than TAPType,
but it does not explicitly exclude the use of TAPType.
- "The use of VOTableType (rather than TAPType) in the VOSI-tables output is recommended because the values map directly"
- "!TAPType may be used when VOTableType does not provide a suitable alternative"
This is in contrast to the text of the VOSI specification, which includes a number of examples
using a TAP adql:VARCHAR data type:
<column>
<name>cfhtlsID </name>
<dataType xsi:type="vs:TAP" size="30">adql:VARCHAR</dataType>
</column>
In addition, the text of the VODataService specification
cites the following two examples as equivalent:
<dataType xsi:type="vs:VOTableType" arraysize="*"> char </dataType>
....
<dataType xsi:type="vs:TAPType"> VARCHAR </dataType>
and a third example describes a fixed length string, using size
rather than arraysize :
<dataType xsi:type="vs:TAPType" size="8" > CHAR </dataType>
Which conflicts with both the recomendation to use VOTableType
rather than TAPType in TAP /tables and the advisory
regarding the planned obsolescence of the TAP_SCHEMA
size column in future versions of the
TAP specification.
TAP_SCHEMA
Section 4 (TAP_SCHEMA)
of the TAP specification
describes the TAP_SCHEMA metadata tables.
The specification text states that TAP_SCHEMA amd VOSI /tables should be equivalent.
- "The VOSI tables resource provides the same metadata as the TAP_SCHEMA but in a rigorously controlled format"
- "the information in the TAP_SCHEMA is equivalent to that defined by the VODataService"
However, as shown below there are a number of inconsitencies between the various specifications which
mean that this is not always the case.
TAP_SCHEMA.schema
Section 4.1 (Schema)
of the TAP specification
describes the content and structure of the
TAP_SCHEMA.schema metadata table.
TAP_SCHEMA.tables
Section 4.2 (Tables)
of the TAP specification
describes the content and structure of the
TAP_SCHEMA.tables metadata table.
Note -The definition of TAP_SCHEMA.tables does not state that values in the schema_name column
must match the corresponding schema_name column in the TAP_SCHEMA.schema table.
TAP_SCHEMA.columns
Section 4.3 (Columns)
of the TAP specification
describes the content and structure of the
TAP_SCHEMA.columns metadata table.
column name |
datatype |
arraysize |
xtype |
not-null |
table_name |
char |
* |
null |
true |
column_name |
char |
* |
null |
true |
datatype |
char |
* |
null |
true |
arraysize |
char |
* |
null |
false |
xtype |
char |
* |
null |
false |
"size" |
int |
1 |
null |
false |
description |
char |
* |
null |
false |
utype |
char |
* |
null |
false |
unit |
char |
* |
null |
false |
ucd |
char |
* |
null |
false |
indexed |
boolean |
1 |
null |
true |
principal |
boolean |
1 |
null |
true |
std |
boolean |
1 |
null |
true |
column_index |
int |
1 |
null |
false |
The specification text explains that
TAP_SCHEMA uses a combination of
datatype ,
arraysize
and
xtype
to descibe the type of a database column.
TAP_SCHEMA.columns.datatype
The text of the specification restricts TAP_SCHEMA.columns.datatype
to use the data types defined in the VOTable specification.
- "The allowed values for datatype ... are specified in VOTable"
Section 2.1 (Primitives)
of the VOTable specification defines the following data types:
Datatype |
Meaning |
FITS |
Bytes |
boolean |
Logical |
L |
1 |
bit |
Bit |
X |
* |
unsignedByte |
Byte (0 to 255) |
B |
1 |
short |
Short Integer |
I |
2 |
int |
Integer |
J |
4 |
long |
Long integer |
K |
8 |
char |
ASCII Character |
A |
1 |
unicodeChar |
Unicode Character |
|
2 |
float |
Floating point |
E |
4 |
double |
Double |
D |
8 |
floatComplex |
Float Complex |
C |
8 |
doubleComplex |
Double Complex |
M |
16 |
In contrast, the /tables resource defined in
Section 2.5 (/tables)
of the TAP specification
is based on the VOSI /tables resource,
which in turn, uses the Table and TableSet
data type elements from the VODataService specification.
The VOTableType defines a similar set of data types to VOTable,
albeit defined in a separate list in a separate specification:
-
boolean
-
bit
-
unsignedByte
-
short
-
int
-
long
-
char
-
unicodeChar
-
float
-
double
-
floatComplex
-
doubleComplex
However, the VOSI specification also allows the TAPType data type
to be used to describe database columns.
The TAPType data type defines a different set of data types, based on the data type
stored inside the database, rather than an external serialization of the data:
-
BOOLEAN
-
SMALLINT
-
INTEGER
-
BIGINT
-
REAL
-
DOUBLE
-
TIMESTAMP
-
CHAR
-
VARCHAR
-
BINARY
-
VARBINARY
-
POINT
-
REGION
-
CLOB
-
BLOB
This second set of data types means that the following example,
based on examples given in both the VOSI and the
VODataService specifications,
would be valid in the /tables response,
but the same data type would not be valid in the corresponding
TAP_SCHEMA datatype .
<column>
<name>cfhtlsID </name>
<dataType xsi:type="vs:TapType" size="30">VARCHAR</dataType>
</column>
TBD - link to the actual data in live TAP services.
TAP_SCHEMA.columns.arraysize
The text of the specification describes the arraysize
column as "the length of variable length datatypes".
- "The arraysize column gives the length of variable length datatypes using the VOTable array shape syntax."
This does not explicitly state whether this is the number of elements in the array, or the size (in bytes) of the array.
The example given is for an array of characters, where the size in bytes is equal to the number of elements.
- "a database column of type varchar(256) would be described with datatype 'char' and arraysize '256*'"
Based on examples given in some of the other VO specifications it is possible to infer that
this is the number of elements and not the size in bytes.
However, the specification could make this clearer by explicitly stating that it is
the "number of elements in the array".
The specification text explicitly refers to the VOTable specification for a definition
of the arraysize syntax.
- "... the syntax for arraysize are specified in VOTable (Ochsenbein and Williams et al., 2013)"
The XML schema for the VOTable specification defines
the arrayDEF syntax restriction which includes
support for the the FTS Substring Array
convention:
<xs:simpleType name="arrayDEF">
<xs:restriction base="xs:token">
<xs:pattern value="([0-9]+x)*[0-9]*[*]?(s\W)?"/>
</xs:restriction>
</xs:simpleType>
However, the arrayDEF restriction is not actually
used in the definition of the VOTable Field
arraysize attribute.
<xs:complexType name="Field">
....
<xs:attribute name="arraysize" type="xs:string"/>
....
</xs:complexType>
The VODataService ArrayShape
syntax restriction, used in both the
VOTableType and the TAPType
elements defined in the VODataService XML schema
does not include support for the the FTS
Substring Array convention:
<xs:simpleType name="ArrayShape">
<xs:restriction base="xs:token">
<xs:pattern value="([0-9]+x)*[0-9]*[*]?"/>
</xs:restriction>
</xs:simpleType>
<xs:complexType name="VOTableType">
<xs:simpleContent>
<xs:restriction base="vs:TableDataType">
....
<xs:attribute name="arraysize" type="vs:ArrayShape" default="1"/>
....
</xs:restriction>
</xs:simpleContent>
</xs:complexType>
<xs:complexType name="TAPType">
....
<xs:simpleContent>
<xs:restriction base="vs:TAPDataType">
....
<xs:attribute name="arraysize" type="vs:ArrayShape" default="1"/>
....
</xs:restriction>
</xs:simpleContent>
</xs:complexType>
This means that whether you use the 'no restriction' definition of the
VOTable arraysize attribute,
or the VOTable arrayDEF restriction,
it is possible to construct a string that would be valid in
TAP_SCHEMA arraysize
but would not be valid in the
VODataService ArrayShape
restriction used in the corresponding
TAP /tables
VOSI response.
TAP_SCHEMA.columns.xtype
The text of the specification refers to the types defined in the DALI specification.
- "Values for xtype are not restricted per se but implementors should use standard values such as those defined in DALI ... before inventing new xtype(s)."
However, the specification does not state that the
TAP_SCHEMA xtype column
is related to the
VOTable xtype attribute
used in
VOTable results,
or the
VODataService extendedType attribute
that is used in the
TAP /tables
VOSI response.
TAP_SCHEMA.columns.size
The text of the specification states that the size
column is kept for backwards compatibility and will be removed in the next major
version of the TAP specification. |