VO data types
This review of the data types defined in the VO specifications was initially done for my own benefit,
to help me understand how the different methods for describing data types in the VO fitted together.
The review looks specifically at the relationships between types, attributes and columns,
particularly those with similar names in different standards and how they relate to each other.
This review is based on the following specifications:
Still to do are:
VODataService
The
VODataService specification defines an XML schema for describing data collections and the services that access them.
The data types defined in
VODataService are intended to be used to describe the data in VO data sets and the services and protocols used to access them.
DataType
Section 3.5 (Data Parameters)
of the
VODataService specification
defines the
DataType XML element.
DataType includes the following attributes:
DataType =arraysize
Section 3.5 (Data Parameters)
of the
VODataService specification
defines the
arraysize
attribute.
The specification text describes the
arraysize
attribute as follows:
- "The arraysize attribute indicates the parameter is an array of values of the named type."
- "Its value describes the shape of the array, and the delim attribute may be used to indicate the delimiter that should appear between elements of an array value."
- "The attribute's presence indicates that parameter holds an array values; the attribute's value indicates the length of the array along each dimension of the multi-dimensional array."
ArrayShape
Section 3.5 (Data Parameters)
of the
VODataService specification
defines the
ArrayShape restriction,
which sets the syntax for the
arraysize
attribute.
The specification text describes the
ArrayShape as follows:
- "the VOTable arraysize format (vs:ArrayShape): LxMxN..., where each x-delimited positive integer is a length along a dimension of a multi-dimensional array. A single integer indicates a one dimensional array. Instead of an integer, the last length can be set to "*" which indicates a variable length."
Note - The reference to
"VOTable arraysize format (vs:ArrayShape)" is wrong,
ArrayShape is defined in the
VODataService specification not in the
VOTable specification.
The XML schema defines the
ArrayShape string syntax as follows:
<!--
- this definition is taken from the VOTable arrayDEF type
-->
<xs:simpleType name="ArrayShape">
<xs:annotation>
<xs:documentation>
An expression of a the shape of a multi-dimensional array
of the form LxNxM... where each value between gives the
integer length of the array along a dimension. An
asterisk (*) as the last dimension of the shape indicates
that the length of the last axis is variable or
undetermined.
</xs:documentation>
</xs:annotation>
<xs:restriction base="xs:token">
<xs:pattern value="([0-9]+x)*[0-9]*[*]?"/>
</xs:restriction>
</xs:simpleType>
As the comment in the XML schema suggests, the
ArrayShape string syntax
is similar to, but not explicitly related to, the
arrayDEF string format
defined in the
VOTable specification.
The
ArrayShape string syntax is used in several places in the
VODataService XML schema to define the content of
arraysize
attributes on elements derived from
DataType, including
VOTableType and
TAPType.
The
ArrayShape string syntax is not used directly in any of the other VO specifications.
DataType =delim
Section 3.5 (Data Parameters)
of the
VODataService specification
defines the
delim
attribute.
The specification text describes the
delim
attribute as follows:
- "the string that is used to delimit element of an array value when arraysize is not "1""
The XML schema defines a default value the
delim
attribute as a single white space.
<xs:attribute name="delim" type="xs:string" default=" ">
The specification text and comments in the XML schema encourages applications to allow optional
spaces before and after the delimiter (e.g. "1, 5" when delim=",").
However, this is not explicitly encoded in the XML schema itself.
The
delim
attribute is not referred to by any of the other VO specifications.
All the other VO specifications use white space as the delimiter,
either explicitly defined in the specification text, or by implication
in the examples.
The definitions for arrays and complex numbers in the
VOTable specification explicitly declares white space as the delimiter.
- The VOTable
TABLEDATA
serialization for arrays of numeric values explicity uses white space as the delimiter.
- The VOTable
TABLEDATA
serialization for floatComplex
and doubleComplex
explicity uses white space as the delimiter.
Although none of the data types defined in the
DALI specifcation explicitly declare a delimiter,
all of the examples in the text use white space.
- The example for Interval uses white space as the delimiter.
- The example for Point uses white space as the delimiter.
- The example for Circle uses white space as the delimiter.
- The example for Polygon uses white space as the delimiter.
DataType =extendedType
Section 3.5 (Data Parameters)
of the
VODataService specification
defines the
extendedType
attribute.
The specification text describes the
extendedType
attribute as follows:
- "The data value represented by this type can be interpreted as of a custom type identified by the value of this attribute. "
- "The name implies a particular expected format for the data value that can be parsed into a value in memory."
- " If an application does not recognize this extendedType, it should attempt to handle value assuming the type given by the element's value. "string" (or its equivalent) is a recommended default type."
- " This element may make use of the extendedSchema attribute and/or any arbitrary (qualified) attribute to refine the identification of the type. "
Looking at the body of standards as a whole, seems to suggest tha that the
extendedType
attribute
is functionally equivalent to the
xtype attribute defined in the
VOTable specification.
However, as far as we can tell, this is not explicitly stated anywhere, and there in no mapping defined between the
(
extendedType
|
extendedSchema
) attribute pair defined in
VODataService
and the (
xtype with prefix) attribute defined in the
VOTable specification.
The
VODataService specification does not provide an example of how the
extendedType
attribute could be used.
The
extendedType
attribute is not referred to in any of the other VO specifications.
DataType =extendedSchema
Section 3.5 (Data Parameters)
of the
VODataService specification
defines the
extendedType
attribute.
The specification text describes the
extendedType
attribute as follows:
- "An identifier for the schema that the value given by the extended attribute is drawn from."
The specification does not provide an example of how the
extendedSchema
attribute would be used.
The
extendedSchema
attribute is not used in the
VODataService specification.
The
extendedSchema
attribute is not used in any of the other VO specifications.
TableDataType
Section 3.5.3 (Table Column Data Types)
of the
VODataService specification
defines the
TableDataType XML element.
TableDataType extends the
DataType element.
The XML schema describes
TableDataType as:
- "an abstract parent for a class of data types that can be used to specify the data type of a table column."
VOTableType
Section 3.5.3 (Table Column Data Types)
of the
VODataService specification
defines the
VOTableType XML element
The
VOTableType XML element extends the
DataType element.
The
VOTableType XML element inherits the following attributes from
DataType:
VOTableType defines the following set of allowed values:
-
boolean
-
bit
-
unsignedByte
-
short
-
int
-
long
-
char
-
unicodeChar
-
float
-
double
-
floatComplex
-
doubleComplex
The specification text describes
VOTableType as follows :
- "data types that correspond to the parameter and column types defined in the VOTable schema"
The XML schema comments describe
VOTableType as follows :
- "a data type supported explicitly by the VOTable format".
The definition of
VOTableType does not provide any further details about the sizes,
ranges or content of the data types.
It is left to the reader to refer to the
VOTable specification for details about the data types.
The definition of
VOTableType states that string values of arbitrary length are
represented by a data type of
char
with
arraysize="*"
.
In order to support strings with unicode characters it may be clearer to explicitly state
ASCII
strings should be represented by a data type of
char
with
arraysize="*"
and
Unicode strings
should be represented by a data type of
unicodeChar
and
arraysize="*"
.
Note - the bibliography reference to the
VOTable specification explicitly refers to
version 1.2 (20091130) of
the specification, which has since been superceded by
version 1.3 (20130920).
TAPDataType
The
TAPDataType element is not explicitly described in the text of the
VODataService specification.
The
VODataService XML schema describes
TAPDataType as follows:.
- "an abstract parent for the specific data types supported by the Table Access Protocol"
The XML schema for the
TAPDataType element defines the following attribute:
Note - the
TAPDataType element name reflects the historical situation where the data types were originally defined in the
TAP specification. The data type definitions have since been moved to the
ADQL specification, but for backward compatibility the XML element name has not been changed.
TAPType
Section 3.5.3 (Table Column Data Types)
of the
VODataService specification
defines the
TAPType XML element.
The
TAPType element inherits the following attributes from
DataType:
The
TAPType inherits the following attribute from
TAPDataType:
-
size
the length of a variable-length value
TAPType defines the following set of allowed values:
-
BOOLEAN
-
SMALLINT
-
INTEGER
-
BIGINT
-
REAL
-
DOUBLE
-
TIMESTAMP
-
CHAR
-
VARCHAR
-
BINARY
-
VARBINARY
-
POINT
-
REGION
-
CLOB
-
BLOB
The specification text describes
TAPType as follows :
- "data types that correspond column types defined in the Table Access Protocol (v1.0) [TAP]"
The explicit reference to version 1.0 of the
TAP specification is out of date.
The
TAP specification no longer contains the definition of these data types.
The
TAPType element name reflects the historical situation where the data types were originally
defined in the
TAP specification. The data type definitions have since been moved
to the
ADQL specification, but for compatibility reasons, the XML element name has not been changed.
The definition of
TAPType does not provide any further details about the sizes,
ranges or content of the data types.
It is left to the reader to refer to the
TAP
(now
ADQL) specification for details about the data types.
The text at the end the section refers to a mapping between
TAP_SCHEMA
types
and
VOTable types in the
TAP specification.
- "Note that the TAP standard [TAP] defines an explicit mapping between TAP_SCHEMA types and VOTable types."
This mapping is no longer part of the
TAP specification.
The definition of
TAPType states that string values should be represented
by a data type of
VARCHAR
, the text does not say whether this should use a
size
or an
arraysize
attribute.
Note - the
TAPType element name reflects the historical situation where
the data types were originally defined in the
TAP specification.
The data type definitions have since been moved to the
ADQL specification,
but for backward compatibility the XML element name has not been changed.
TAPType =size
Section 3.5.3 (Table Column Data Types)
of the
VODataService specification
defines the
size
attribute.
Technically,
size
is an attribute of the abstract
TAPDataType parent element,
which is then inherited by the
TAPType element.
The text describing for the
TAPType element describes the
size
attribute as follows:
- "The length of the variable-length data type."
- "In the context of TAP, this attribute is only meaning when the data type is CHAR or BINARY; see discussion below."
This restriction implies that
CHAR
and
BINARY
values are not arrays of values and have an inherent
'size' property, which is distinct from the
'arraysize' property.
In the discussion that follows, the
VODataService specification cites the following two examples as equivalent:
<dataType xsi:type="vs:VOTableType" arraysize="*"> char </dataType>
<dataType xsi:type="vs:TAPType"> VARCHAR </dataType>
A third example describes a fixed length string, using the
size
rather than the
arraysize
attribute
<dataType xsi:type="vs:TAPType" size="8" > CHAR </dataType>
However, the
VODataService specification does not explicitly explain the difference between
the folllowing examples:
<dataType xsi:type="vs:TAPType" size="8" > CHAR </dataType>
<dataType xsi:type="vs:TAPType" arraysize="8" > CHAR </dataType>
The comments in the XML schema for
TAPDataType describes the
size
attribute as follows:
- "This corresponds to the size Column attribute in the TAP_SCHEMA and can be used with data types that are defined with a length (CHAR, BINARY)."
This establishes a link between the
TAPDataType size
attribute and the
size
column in
TAP_SCHEMA
.
In the currenet version of the
TAP specification the corresponding
size
column is described as :
- "retained for backwards compatibility to TAP-1.0"
The original text in version 1.0 of the
TAP specification describes the
size
column as follows :
- "The “size” gives the length of variable length datatypes, for example varchar(256);"
The
TAP specification does not link the
size
column back to
TAPDataType in the
VODataService specification.
VODataService Table
... TBD
VODataService TableSet
... TBD
VOTable
The
VOTable specification defines an XML based serialization format for exchanging tabular data in the VO.
VOTable data types
Section 2.1 (Primitives)
of the
VOTable specification
defines a core set of primitive data types.
The following table descibes the types, their semantic meaning, the corresponding FITS data type and the size in bytes:
Datatype |
Meaning |
FITS |
Bytes |
boolean |
Logical |
L |
1 |
bit |
Bit |
X |
* |
unsignedByte |
Byte (0 to 255) |
B |
1 |
short |
Short Integer |
I |
2 |
int |
Integer |
J |
4 |
long |
Long integer |
K |
8 |
char |
ASCII Character |
A |
1 |
unicodeChar |
Unicode Character |
|
2 |
float |
Floating point |
E |
4 |
double |
Double |
D |
8 |
floatComplex |
Float Complex |
C |
8 |
doubleComplex |
Double Complex |
M |
16 |
VOTable serialization
Section 6 (Definitions of Primitive Datatypes)
of the
VOTable specification describes the
BINARY
,
BINARY2
and
TABLEDATA
serializations of each of the primative data types.
VOTable =boolean
Case insensitive long form:
- TRUE FALSE
- True False
- true false
Case insensitive short form:
Numeric, one or zero:
This results in a slightly different definition of how to serialize boolean values
compared to the
DALI definition, which uses a defintion from the
W3C XML schema specification.
VOTable =bit
Array of bits, padded to fit into bytes.
VOTable =unsignedByte
8 bit (unsigned) integers, 0 to 255.
VOTable =short
16 bit signed integers, -32768 to 32767.
VOTable =int
32 bit signed integers, -2147483648 to 2147483647.
VOTable =long
64 bit signed integers, -9223372036854775808 to 9223372036854775807.
VOTable =float
ANSI/IEEE-754 32-bit floating point numbers.
VOTable =double
ANSI/IEEE-754 64-bit double precision floating point numbers.
VOTable =char
ASCII (7-bit) characters.
VOTable =unicodeChar
The description for the
BINARY
serialization of
unicodeChar
defines it as
Unicode (UCS-2) fixed width 2-byte characters.
- "Each Unicode character is represented in the BINARY/BINARY2 serialization by two bytes, using the big-endian UCS-2 encoding (ISO-10646-UCS-2)"
The UCS-2 character set includes all of the characters in the
Basic Multilingual Plane (BMP),
which contains characters for
almost all modern languages.
The description for the
TABLEDATA
serialization includes an example showing how a
unicodeChar
that is outside the ASCII character
set can be represented in an XML document by using a
numeric character reference (NCR).
- "The representation of a Unicode character in the
TABLEDATA
serialization follows the XML specifications, and e.g. the Cyrillic uppercase ``Ya'' can be written Я in UTF-8."
The reference to
UTF-8 in this description may be misleading,
because a UTF-8 document can contain the multi-byte Cyrillic uppercase ``Ya'' character, Я, as-is, without
needing to use a numeric character reference.
The reason for using numeric character references is if the document character set is not able to represent the ``Ya'' character, Я.
As a result, declaring a UTF-8 encoding for a
VOTable document containing
TABLEDATA
data may be problematic,
<?xml version=“1.0” encoding=“utf-8”?>
as this would mean the
VOTable document would be able to contain complex multibyte characters
that are beyond the range of the UCS-2 fixed-width character set.
It may be better to specify the character encoding for VOTable documents as
UCS-2
,
<?xml version=“1.0” encoding=“ucs-2”?>
This would make the the
TABLEDATA
serialization equivalent to the
BINARY
serialization,
and require numeric character references for all characters outside the
UCS-2
two byte
fixed size.
Note - There is a paragraph needed to link this section and the section describing the different
MIME types
and how they would effect the serialization of
unicodeChar
strings.
Note - since 2005 it is no longer possible to encode all of the mandatory components defined in the official
GB 18030-2005 character set of the People's Republic of China
in a fixed width 2 byte character set. Support for the GB 18030-2005 character set is officially
required for all software products sold in the PRC.
VOTable =floatComplex
The description for the
BINARY
serialization of
floatComplex
defines it as a pair of 32-bit, single precision, floating point numbers.
- "a sequence of pairs of 32-bit single precision floating point numbers in big-endian order"
The description for the
TABLEDATA
serialization of
floatComplex
defines it as a pair of floating point numbers separated by white space.
- "two representations of a Single Precision Floating Point numbers separated by whitespace, representing the real and imaginary part respectively"
Note that this effectively fixes the delimter for the
TABLEDATA
serialization to white space, regardless of the
delim
attribute
set in the
VODataService description of the source data table.
VOTable =doubleComplex
The description for the
BINARY
serialization of
doubleComplex
defines it as a pair of 64-bit, double precision, floating point numbers.
- "a sequence of pairs of 64-bit double precision floating point numbers in big-endian order"
The description for the
TABLEDATA
serialization of
floatComplex
defines it as a pair of floating point numbers separated by white space.
- "two representations of a Double Precision Floating Point numbers separated by whitespace, representing the real and imaginary part respectively"
Note that this effectively fixes the delimter for the
TABLEDATA
serialization to white space, regardless of the
delim
attribute
set in the
VODataService description of the source data table.
VOTable =xtype
Section
4.3 (Extended Datatype)) of the
VOTable specification describes the
xtype
attribute as bridging the gap between
the FITS based primitive
VOTableTypes and the data types
used to express
TAP ADQL
database queries and their results.
The
VOTable specification does not define a definitive list of standard xtype values.
Section 3.3 (Literal Values)
of the
DALI specification suggests that services should use a prefix for non-standard xtype values.
The
DALI specification does define a number of types, including POINT, CIRCLE and POLYGON. However it does not declare a specific list of standard xtype values which do not need prefixes.
The
VOTable specification does not explicitly state that the
VOTable xtype
attribute
is related to the
TAP_SCHEMA xtype
column
used in
TAP_SCHEMA metadata tables,
or the
VODataService extendedType
attribute
that is used in the
TAP /tables
VOSI response.
VOTable =timestamp
The
VOTable specification cites an example of using the
xtype
attribute to describe a
timestamp
value.
- _a UTC date/time string following the ISO-8601 standard (YYYY-MM-DDThh:mm:ss followed by a decimal point and fractions of seconds)"
The
VOTable specification does not link to the
DALI specification,
which has a more detailed description of how
timestamp
values should be
represented using
xtype
.
VOTable arrays
Section 2.2 of the
VOTable specification uses a number of examples to show how a
combination of
datatype
and
arraysize
attributes can be used to describe
arrays of values in the metadata for a FIELD.
Section 5.1 of the
VOTable specification describes the
TABLEDATA
serialization of arrays as follows:
- "If a cell contains an array of numbers or a complex number, it should be encoded as multiple numbers separated by whitespace. However in the case of character and Unicode strings (declared in the corresponding FIELD as an array of char or unicodeChar datatype), no separator should exist."
It uses the following example to illustrate the difference between arrays of numbers and arrays of characters:
<TABLE>
<FIELD name="aString" datatype="char" arraysize="10"/>
<FIELD name="aShort" datatype="short"/>
<FIELD name="varInts" datatype="int" arraysize="*"/>
<FIELD name="Floats" datatype="float"arraysize="3"/>
<DATA><TABLEDATA>
<TR> <TD>Apple</TD> <TD/> <TD>1 2 4 8 16</TD> <TD>1.62 4.56 3.44</TD> </TR>
<TR> <TD>Orange</TD> <TD>15</TD> <TD>23 -11 9</TD> <TD>2.33 4.66 9.53</TD> </TR>
</TABLEDATA></DATA>
</TABLE>
VOTable =delim
The
VOTable specification does not include anything to describe the delimiter for arrays of values.
VOTable Field
... TBD
VOTable =arraysize
The text of the
VOTable specification does not explicitly define the
arraysize
attribute.
The XML specification for the
arraysize
attribute
does not apply a restriction to the content of the attribute.
<xs:complexType name="Field">
....
<xs:attribute name="arraysize" type="xs:string"/>
....
</xs:complexType>
The
VOTable specification does not link the
VOTable
arraysize
attribute with the
DataType
arraysize
attribute defined in the
VODataService
specification.
VOTable =arrayDEF
The
VOTable XML schema defines the
arrayDEF syntax restriction as follows:
<xs:simpleType name="arrayDEF">
<xs:restriction base="xs:token">
<xs:pattern value="([0-9]+x)*[0-9]*[*]?(s\W)?"/>
</xs:restriction>
</xs:simpleType>
However, the
arrayDEF syntax restriction is
not used in the definition
of the
arraysize
attribute:
<xs:complexType name="Field">
....
<xs:attribute name="arraysize" type="xs:string"/>
....
</xs:complexType>
This means that the content of the
VOTable arraysize
attribute is unrestricted, and may contain any string.
In contrast, the
VODataService does restrict the
content of the of the
DataType arraysize
attrribute, with the
ArrayShape restriction.
This means it is possible to create a value for
arraysize
that is valid in
VOTable but is not valid in
the
VOSI /tables
response, which uses the
ArrayShape syntax restriction
defined in the
VODataService schema.
The only reference to the
arrayDEF syntax restriction
in the other VO specifications is a comment in the definition of the
ArrayShape in the
VODataService schema.
The text of the
VOTable specification does not link the
arrayDEF string syntax with the
ArrayShape string
syntax defined in the
VODataService schema.
The
arrayDEF string syntax is not used anywhere else in
VOTable XML schema.
The
arrayDEF string syntax is not used in any of the other VO specifications.
Arrays of Variable-Length Strings
Appendix A.3 (Arrays of Variable-Length Strings)
refers to the Substring Array convention, described in an appendix of the FITS specification.
The text in this specification suggests that a similar convention could be used
in
VOTable.
- "A convention similar to the FITS one could be introduced in VOTable in the arraysize attribute ..."
However, the text does not go beyond suggesting this as a possibility, and does not
declare whether this is reccomended practice or not.
Provision for this extension is included in the regular expression
for the
arrayDEF syntax restiction.
<xs:simpleType name="arrayDEF">
<xs:restriction base="xs:token">
<xs:pattern value="([0-9]+x)*[0-9]*[*]?(s\W)?"/>
</xs:restriction>
</xs:simpleType>
However, because
VOTable arrayDEF and
VODataService ArrayShape
are not directly linked, adding support for this to
VOTable
makes it is possible to create a value for
arraysize
that is valid in
VOTable but is not valid in
the
VOSI /tables
response, which uses the
ArrayShape syntax restriction
defined in the
VODataService schema.
MIME type
Section 8 (MIME type)
describes the format of MIME types that can be used to describe a
VOTable document.
The text includes the following set of examples:
-
text/xml
-
text/xml; charset="iso-8859-1"
-
application/x-votable+xml
-
application/x-votable+xml; serialization=tabledata
-
application/x-votable+xml; serialization=TABLEDATA; charset=iso-8859-1
The text in this section states that if the optional
charset
parameter is not supplied,
then default of
US-ASCII
is assumed.
This may conflict with the discussion of the
unicodeChar
data type,
which assumes that the default
charset
is
UTF-8
.
The specification does not go into details about how changing the MIME type
charset
could change the way that strings of
unicodeChar
are serialized.
DALI
The
DALI specification defines the base web service interface common to all Data Access Layer (DAL) services.
DALI tables
Section 2.6 (VOSI-tables)
of the
DALI specification refers to the
VOSI-tables web resource,
defined by the Grid and Web Wervices working group.
DALI types
Section 3.3 (Literal values)
of the
DALI specification specifies how a number of data types should be expressed
by
DALI services.
Note that although the following data types and the rules for representing them may be used for literal values
in parameters passed to
DALI services,
these definitions are also referred to by other VO specifications to describe how to represent data values in
VOTable results,
VOSI /tables
responses
and
TAP_SCHEMA metadata tables.
This may mean that
'Literal values' might not be the most appropriate title for this section.
DALI Numbers
Section 3.3.1 (Numbers)
refers to the
VOTable specification.
- "Integer and real numbers must be represented in a manner consistent with the specification for numbers in VOTable"
However, it is not clear which
VOTable serializations,
BINARY
,
BINARY2
or
TABLEDATA
apply in which situations.
DALI Boolean
Section 3.3.2 (Boolean)
refers to part 2 of the
W3C XML schema specification.
This results in a slightly different definition of how boolean values should be expressed compared to the
definition given in the
VOTable specification.
DALI Timestamp
Section 3.3.3 (Timestamp)
defines three representations for date and time values as follows:
- For astronomical values the time and date format follows the convention established for FITS, mandating UTC, but omitting the timzone in the representation,
YYYY-MM-DD['T'hh:mm:ss[.SSS]]
.
- For civil values, relating to events at locations on the Earth, the format includes an optional 'Z' to to explicitly specify the UTC time zone,
YYYY-MM-DD['T'hh:mm:ss[.SSS]['Z']]
.
- Julian Date (JD) or Modified Julian Date (MJD) values should follow the rules for double precision numbers.
The specification also describes how to represent
timestamp
values in the metadata for a
VOTable FIELD using
timestamp
for
the
[#VoTableXtype][xtype]
attrbute.
<FIELD datatype="char", arraysize="*", xtype="timestamp">
Note - the specification text includes the following:
- "Julian Date (JD) or Modified Julian Date (MJD), these follow the rules for double precision numbers above"
However, apart from the reference to the
VOTable in the section on
numeric values
there are no explicit rules for double precision numbers in the preceeding text.
DALI Interval
Section 3.3.4 (Interval)
defines how to represent numeric intervals as pairs of numeric values.
The specification describes how to represent numeric
intervals
in the metadata for a
VOTable FIELD,
by setting
arraysize
to 2 and using
interval
for the
xtype
attrbute.
<FIELD datatype="short", arraysize="2", xtype="interval">
<FIELD datatype="int", arraysize="2", xtype="interval">
<FIELD datatype="long", arraysize="2", xtype="interval">
....
<FIELD datatype="float", arraysize="2", xtype="interval">
<FIELD datatype="double", arraysize="2", xtype="interval">
All of the examples shown in the specification text use space as the delimiter.
However, the specification does not explicity define a delimiter,
nor does it refer to the
delim
attribute
defined in
VODataService.
DALI Time Interval (proposed)
PROPOSED
The
version (1.1)
of the
DALI specification does not describe how to represent time intervals.
Given that we have already used
timestamp
and
interval
to cover the separate cases,
we will need to define a new
xtype
value to describe an interval
of timestamps.
We propose
time-interval
as the
xtype
value to represent an interval of timestamps.
Given that
timestamp is serialized as an array of characters,
and given that the geometric types,
Point,
Circle, and
Polygon are represented
as arrays of floating point numbers.
Them following a similar patten, an
Interval of
Timestamps could be represented as a space delimited
sequence of two
Timestamps:
<FIELD datatype="char", arraysize="2x*", xtype="time-interval">
1970-01-01T00:00:00.000Z 2017-08-16T17:12:54.621Z
If the accuracy of the timestamps is known, then the array size of the
Timestamps can be fixed:
<FIELD datatype="char", arraysize="2x24", xtype="time-interval">
1970-01-01T00:00:00.000Z 2017-08-16T17:12:54.621Z
DALI Point
Section 3.3.5 (Point)
of the
DALI specification
defines how to represent a geometric point as an array of two floating point numbers.
<FIELD ... datatype="float" arraysize="2" xtype="point">
....
<FIELD ... datatype="double" arraysize="2" xtype="point">
The specification text states that the
usual representation is to use longitude and latitude values in spherical coordinates.
The text also implies that other coordinate systems can be used:
- "although they are usually longitude and latitude values in spherical coordinates this is specified in the coordinate metadata and not in the values"
However, the text does not show how to a specifiy a different coordinate system.
The example shown in the specification text uses space as the delimiter:
12.3 45.6
The specification does not explicity define a delimiter,
nor does it refer to the
delim
attribute
defined in
VODataService.
DALI Circle
Section 3.3.6 (Circle)
of the
DALI specification
defines how to represent a circle as an array of three floating point numbers.
<FIELD ... datatype="float" arraysize="3" xtype="circle">
...
<FIELD ... datatype="double" arraysize="3" xtype="circle">
The specification text implies that the
usual representation is to use longitude, latitude and radius in spherical coordinates.
However, the text does not show how to a specifiy a different coordinate system.
The example shown in the specification text uses space as the delimiter:
12.3 45.6 0.5
The specification does not explicity define a delimiter,
nor does it refer to the
delim
attribute
defined in
VODataService.
Note - the text in the
current version (1.1) repeats the same range restriction twice.
- "For circles in a spherical coordinate system ... longitude values must fall within [0,360], latitude values within [-90,90], and radius values in (0,180]"
- "In spherical coordinates, all longitude values must fall within [0,360] and all latitude values within [-90,90]"
DALI Polygon
Section 3.3.7 (Polygon)
of the
DALI specification
defines how to represent a polygon as an array of floating point numbers.
<FIELD ... datatype="float" arraysize="*" xtype="polygon">
....
<FIELD ... datatype="double" arraysize="*" xtype="polygon">
Note that the polygon is modelled as a one-dimensional array of numbers,
rather than a two-dimensional array of pairs of numbers.
<FIELD ... datatype="float" arraysize="2,*" xtype="polygon">
The text of the specification implies that the
usual representation is to use spherical coordinates.
However, the text does not show how to a specifiy a different coordinate system.
The example shown in the specification text uses space as the delimiter:
10.0 10.0 10.2 10.0 10.2 10.2 10.0 10.2
However, the specification does not explicity define a delimiter,
nor does it refer to the
delim
attribute
defined in
VODataService.
DALI RESPONSEFORMAT
Section 3.4.3 (RESPONSEFORMAT) of the
DALI specification
describes the
RESPONSEFORMAT
parameter that enables a client to request different
MIME types and response formats from a
DALI service.
... TDB
Note - need to cross reference this with
charset
from the VOTable MIME types.
VOSI
The
VOSI specification defines a number of
web service interface methods and resources that are
common to all of the VO services.
VOSI =/tables
Section 3.3 (Table metadata)
of the
VOSI specification describes a resource that describes the content of database tables
accessible from a VO service.
The
VOSI Tables XML schema imports the
Table and
TableSet
elements from the
VODataService specification.
The
VOSI Tables XML schema inherits both the
VOTableType and
TAPType
elements from the
VODataService specification, along with their attributes.
The
VOTableType and
TAPType elements inherit the following attributes
from
DataType:
The
TAPType element inherits the following attribute
from
#TAPDataType:
Note - the examples given in the text of the
VOSI specification
use
vs:TAP
rather than
vs:TAPType
for the
xsi:type
attribute.
<column>
<name>cfhtlsID </name>
<dataType xsi:type="vs:TAP" size="30">adql:VARCHAR</dataType>
</column>
Note - the examples given in the text of the
VOSI specification use
a prefix for the
adql:VARCHAR
data type, which is not a valid value for
a
TAPType data type.
TAP
The
TAP specification describes the Table Access Protocol web service.
TAP =/tables
Section 2.5 (/tables)
of the
TAP specification
describes the
/tables
resource.
The text in the
current version (WD-TAP-1.1-20170707)
of the specification refers to the
VODataService specification
for details of the
/tables
resource content.
However, the content, and more importantly, the behaviour, of the
VOSI /tables
resource
is described in
Section 3.3 (Table metadata)
of the
VOSI specification.
Which then imports the
Table and
TableSet
XML elements from the
VODataService specification. So should the primary reference be to
VOSI /tables
resource rather than
VODataService.
The text of the specification
recommends using
VOTableType rather than
TAPType,
but it does not explicitly exclude the use of
TAPType.
- "The use of VOTableType (rather than TAPType) in the VOSI-tables output is recommended because the values map directly"
- "TAPType may be used when VOTableType does not provide a suitable alternative"
This is in contrast to the text of the
VOSI specification, which includes a number of examples
using a
TAP adql:VARCHAR
data type:
<column>
<name>cfhtlsID </name>
<dataType xsi:type="vs:TAP" size="30">adql:VARCHAR</dataType>
</column>
In addition, the text of the
VODataService specification
cites the following two examples as equivalent:
<dataType xsi:type="vs:VOTableType" arraysize="*"> char </dataType>
....
<dataType xsi:type="vs:TAPType"> VARCHAR </dataType>
and a third example describes a fixed length string, using
size
rather than
arraysize
:
<dataType xsi:type="vs:TAPType" size="8" > CHAR </dataType>
Which conflicts with both the recomendation to use
VOTableType
rather than
TAPType in
TAP /tables
and the advisory
regarding the planned obsolescence of the
TAP_SCHEMA
size
column in future versions of the
TAP specification.
TAP_SCHEMA
Section 4 (TAP_SCHEMA)
of the
TAP specification
describes the
TAP_SCHEMA
metadata tables.
The specification text states that TAP_SCHEMA amd VOSI /tables should be equivalent.
- "The VOSI tables resource provides the same metadata as the TAP_SCHEMA but in a rigorously controlled format"
- "the information in the TAP_SCHEMA is equivalent to that defined by the VODataService"
However, as shown below there are a number of inconsitencies between the various specifications which
mean that this is not always the case.
TAP_SCHEMA.schema
Section 4.1 (Schema)
of the
TAP specification
describes the content and structure of the
TAP_SCHEMA.schema
metadata table.
TAP_SCHEMA.tables
Section 4.2 (Tables)
of the
TAP specification
describes the content and structure of the
TAP_SCHEMA.tables
metadata table.
Note -The definition of
TAP_SCHEMA.tables
does not state that values in the
schema_name
column
must match the corresponding
schema_name
column in the
TAP_SCHEMA.schema
table.
TAP_SCHEMA.columns
Section 4.3 (Columns)
of the
TAP specification
describes the content and structure of the
TAP_SCHEMA.columns
metadata table.
column name |
datatype |
arraysize |
xtype |
not-null |
table_name |
char |
* |
null |
true |
column_name |
char |
* |
null |
true |
datatype |
char |
* |
null |
true |
arraysize |
char |
* |
null |
false |
xtype |
char |
* |
null |
false |
"size" |
int |
1 |
null |
false |
description |
char |
* |
null |
false |
utype |
char |
* |
null |
false |
unit |
char |
* |
null |
false |
ucd |
char |
* |
null |
false |
indexed |
boolean |
1 |
null |
true |
principal |
boolean |
1 |
null |
true |
std |
boolean |
1 |
null |
true |
column_index |
int |
1 |
null |
false |
The specification text explains that
TAP_SCHEMA uses a combination of
datatype
,
arraysize
and
xtype
to descibe the type of a database column.
TAP_SCHEMA.columns.datatype
The text of the specification restricts
TAP_SCHEMA.columns.datatype
to use the data types defined in the
VOTable specification.
- "The allowed values for datatype ... are specified in VOTable"
Section 2.1 (Primitives)
of the
VOTable specification defines the following data types:
Datatype |
Meaning |
FITS |
Bytes |
boolean |
Logical |
L |
1 |
bit |
Bit |
X |
* |
unsignedByte |
Byte (0 to 255) |
B |
1 |
short |
Short Integer |
I |
2 |
int |
Integer |
J |
4 |
long |
Long integer |
K |
8 |
char |
ASCII Character |
A |
1 |
unicodeChar |
Unicode Character |
|
2 |
float |
Floating point |
E |
4 |
double |
Double |
D |
8 |
floatComplex |
Float Complex |
C |
8 |
doubleComplex |
Double Complex |
M |
16 |
In contrast, the
/tables
resource defined in
Section 2.5 (/tables)
of the
TAP specification
is based on the
VOSI /tables
resource,
which in turn, uses the
Table and
TableSet
data type elements from the
VODataService specification.
The
VOTableType defines a similar set of data types to
VOTable,
albeit defined in a separate list in a separate specification:
-
boolean
-
bit
-
unsignedByte
-
short
-
int
-
long
-
char
-
unicodeChar
-
float
-
double
-
floatComplex
-
doubleComplex
However, the
VOSI specification also allows the
TAPType data type
to be used to describe database columns.
The
TAPType data type defines a different set of data types, based on the data type
stored inside the database, rather than an external serialization of the data:
-
BOOLEAN
-
SMALLINT
-
INTEGER
-
BIGINT
-
REAL
-
DOUBLE
-
TIMESTAMP
-
CHAR
-
VARCHAR
-
BINARY
-
VARBINARY
-
POINT
-
REGION
-
CLOB
-
BLOB
This second set of data types means that the following example,
based on examples given in both the
VOSI and the
VODataService specifications,
would be valid in the
/tables
response,
but the same data type would not be valid in the corresponding
TAP_SCHEMA datatype
.
<column>
<name>cfhtlsID </name>
<dataType xsi:type="vs:TapType" size="30">VARCHAR</dataType>
</column>
TBD - link to the actual data in live TAP services.
TAP_SCHEMA.columns.arraysize
The text of the specification describes the
arraysize
column as
"the length of variable length datatypes".
- "The arraysize column gives the length of variable length datatypes using the VOTable array shape syntax."
This does not explicitly state whether this is the number of elements in the array, or the size (in bytes) of the array.
The example given is for an array of characters, where the size in bytes is equal to the number of elements.
- "a database column of type varchar(256) would be described with datatype 'char' and arraysize '256*'"
Based on examples given in some of the other VO specifications it is possible to infer that
this is the number of elements and not the size in bytes.
However, the specification could make this clearer by explicitly stating that it is
the
"number of elements in the array".
The specification text explicitly refers to the
VOTable specification for a definition
of the
arraysize
syntax.
- "... the syntax for arraysize are specified in VOTable (Ochsenbein and Williams et al., 2013)"
The XML schema for the
VOTable specification defines
the
arrayDEF syntax restriction which includes
support for the the FTS
Substring Array
convention:
<xs:simpleType name="arrayDEF">
<xs:restriction base="xs:token">
<xs:pattern value="([0-9]+x)*[0-9]*[*]?(s\W)?"/>
</xs:restriction>
</xs:simpleType>
However, the
arrayDEF restriction is not actually
used in the definition of the
VOTable Field
arraysize
attribute.
<xs:complexType name="Field">
....
<xs:attribute name="arraysize" type="xs:string"/>
....
</xs:complexType>
The
VODataService ArrayShape
syntax restriction, used in both the
VOTableType and the
TAPType
elements defined in the
VODataService XML schema
does not include support for the the FTS
Substring Array convention:
<xs:simpleType name="ArrayShape">
<xs:restriction base="xs:token">
<xs:pattern value="([0-9]+x)*[0-9]*[*]?"/>
</xs:restriction>
</xs:simpleType>
<xs:complexType name="VOTableType">
<xs:simpleContent>
<xs:restriction base="vs:TableDataType">
....
<xs:attribute name="arraysize" type="vs:ArrayShape" default="1"/>
....
</xs:restriction>
</xs:simpleContent>
</xs:complexType>
<xs:complexType name="TAPType">
....
<xs:simpleContent>
<xs:restriction base="vs:TAPDataType">
....
<xs:attribute name="arraysize" type="vs:ArrayShape" default="1"/>
....
</xs:restriction>
</xs:simpleContent>
</xs:complexType>
This means that whether you use the 'no restriction' definition of the
VOTable arraysize
attribute,
or the
VOTable arrayDEF restriction,
it is possible to construct a string that
would be valid in
TAP_SCHEMA arraysize
but
would not be valid in the
VODataService ArrayShape
restriction used in the corresponding
TAP /tables
VOSI response.
TAP_SCHEMA.columns.xtype
The text of the specification refers to the types defined in the
DALI specification.
- "Values for xtype are not restricted per se but implementors should use standard values such as those defined in DALI ... before inventing new xtype(s)."
However, the specification does not state that the
TAP_SCHEMA xtype
column
is related to the
VOTable xtype
attribute
used in
VOTable results,
or the
VODataService extendedType
attribute
that is used in the
TAP /tables
VOSI response.
TAP_SCHEMA.columns.size
The text of the specification states that the
size
column is kept for backwards compatibility and will be removed in the next major
version of the
TAP specification.
ADQL
The
ADQL specification describes the Astronomy Query Language.
ADQL data types