VO data types

This review of the data types defined in the VO specifications was initially done for my own benefit, to help me understand how the different methods for describing data types in the VO fitted together.

The review looks specifically at the relationships between types, attributes and columns, particularly those with similar names in different standards and how they relate to each other.

This review is based on the following specifications:

Still to do are:

  • TapRegEx
  • ADQL
  • ObsCore



VODataService

The VODataService specification defines an XML schema for describing data collections and the services that access them.

The data types defined in VODataService are intended to be used to describe the data in VO data sets and the services and protocols used to access them.

DataType

Section 3.5 (Data Parameters) of the VODataService specification defines the DataType XML element.

DataType includes the following attributes:

DataType =arraysize

Section 3.5 (Data Parameters) of the VODataService specification defines the arraysize attribute.

The specification text describes the arraysize attribute as follows:

  • "The arraysize attribute indicates the parameter is an array of values of the named type."
  • "Its value describes the shape of the array, and the delim attribute may be used to indicate the delimiter that should appear between elements of an array value."
  • "The attribute's presence indicates that parameter holds an array values; the attribute's value indicates the length of the array along each dimension of the multi-dimensional array."

ArrayShape

Section 3.5 (Data Parameters) of the VODataService specification defines the ArrayShape restriction, which sets the syntax for the arraysize attribute.

The specification text describes the ArrayShape as follows:

  • "the VOTable arraysize format (vs:ArrayShape): LxMxN..., where each x-delimited positive integer is a length along a dimension of a multi-dimensional array. A single integer indicates a one dimensional array. Instead of an integer, the last length can be set to "*" which indicates a variable length."

Note - The reference to "VOTable arraysize format (vs:ArrayShape)" is wrong, ArrayShape is defined in the VODataService specification not in the VOTable specification.

The XML schema defines the ArrayShape string syntax as follows:

    <!--
      -  this definition is taken from the VOTable arrayDEF type
      -->
    <xs:simpleType  name="ArrayShape">
      <xs:annotation>
        <xs:documentation>
          An expression of a the shape of a multi-dimensional array
          of the form LxNxM... where each value between gives the
          integer length of the array along a dimension.  An
          asterisk (*) as the last dimension of the shape indicates 
          that the length of the last axis is variable or
          undetermined. 
        </xs:documentation>
      </xs:annotation>

      <xs:restriction base="xs:token">
        <xs:pattern  value="([0-9]+x)*[0-9]*[*]?"/>
      </xs:restriction>
    </xs:simpleType>

As the comment in the XML schema suggests, the ArrayShape string syntax is similar to, but not explicitly related to, the arrayDEF string format defined in the VOTable specification.

The ArrayShape string syntax is used in several places in the VODataService XML schema to define the content of arraysize attributes on elements derived from DataType, including VOTableType and TAPType.

The ArrayShape string syntax is not used directly in any of the other VO specifications.

DataType =delim

Section 3.5 (Data Parameters) of the VODataService specification defines the delim attribute.

The specification text describes the delim attribute as follows:

  • "the string that is used to delimit element of an array value when arraysize is not "1""

The XML schema defines a default value the delim attribute as a single white space.

    <xs:attribute name="delim" type="xs:string" default=" ">

The specification text and comments in the XML schema encourages applications to allow optional spaces before and after the delimiter (e.g. "1, 5" when delim=","). However, this is not explicitly encoded in the XML schema itself.

The delim attribute is not referred to by any of the other VO specifications.

All the other VO specifications use white space as the delimiter, either explicitly defined in the specification text, or by implication in the examples.

The definitions for arrays and complex numbers in the VOTable specification explicitly declares white space as the delimiter.

  • The VOTable TABLEDATA serialization for arrays of numeric values explicity uses white space as the delimiter.
  • The VOTable TABLEDATA serialization for floatComplex and doubleComplex explicity uses white space as the delimiter.

Although none of the data types defined in the DALI specifcation explicitly declare a delimiter, all of the examples in the text use white space.

  • The example for Interval uses white space as the delimiter.
  • The example for Point uses white space as the delimiter.
  • The example for Circle uses white space as the delimiter.
  • The example for Polygon uses white space as the delimiter.

DataType =extendedType

Section 3.5 (Data Parameters) of the VODataService specification defines the extendedType attribute.

The specification text describes the extendedType attribute as follows:

  • "The data value represented by this type can be interpreted as of a custom type identified by the value of this attribute. "
  • "The name implies a particular expected format for the data value that can be parsed into a value in memory."
  • " If an application does not recognize this extendedType, it should attempt to handle value assuming the type given by the element's value. "string" (or its equivalent) is a recommended default type."
  • " This element may make use of the extendedSchema attribute and/or any arbitrary (qualified) attribute to refine the identification of the type. "

Looking at the body of standards as a whole, seems to suggest tha that the extendedType attribute is functionally equivalent to the xtype attribute defined in the VOTable specification.

However, as far as we can tell, this is not explicitly stated anywhere, and there in no mapping defined between the (extendedType | extendedSchema) attribute pair defined in VODataService and the (xtype with prefix) attribute defined in the VOTable specification.

The VODataService specification does not provide an example of how the extendedType attribute could be used.

The extendedType attribute is not referred to in any of the other VO specifications.

DataType =extendedSchema

Section 3.5 (Data Parameters) of the VODataService specification defines the extendedType attribute.

The specification text describes the extendedType attribute as follows:

  • "An identifier for the schema that the value given by the extended attribute is drawn from."

The specification does not provide an example of how the extendedSchema attribute would be used.

The extendedSchema attribute is not used in the VODataService specification.

The extendedSchema attribute is not used in any of the other VO specifications.

TableDataType

Section 3.5.3 (Table Column Data Types) of the VODataService specification defines the TableDataType XML element.

TableDataType extends the DataType element.

The XML schema describes TableDataType as:

  • "an abstract parent for a class of data types that can be used to specify the data type of a table column."

VOTableType

Section 3.5.3 (Table Column Data Types) of the VODataService specification defines the VOTableType XML element

The VOTableType XML element extends the DataType element.

The VOTableType XML element inherits the following attributes from DataType:

VOTableType defines the following set of allowed values:

  • boolean
  • bit
  • unsignedByte
  • short
  • int
  • long
  • char
  • unicodeChar
  • float
  • double
  • floatComplex
  • doubleComplex

The specification text describes VOTableType as follows :

  • "data types that correspond to the parameter and column types defined in the VOTable schema"

The XML schema comments describe VOTableType as follows :

  • "a data type supported explicitly by the VOTable format".

The definition of VOTableType does not provide any further details about the sizes, ranges or content of the data types. It is left to the reader to refer to the VOTable specification for details about the data types.

The definition of VOTableType states that string values of arbitrary length are represented by a data type of char with arraysize="*". In order to support strings with unicode characters it may be clearer to explicitly state ASCII strings should be represented by a data type of char with arraysize="*" and Unicode strings should be represented by a data type of unicodeChar and arraysize="*".

Note - the bibliography reference to the VOTable specification explicitly refers to version 1.2 (20091130) of the specification, which has since been superceded by version 1.3 (20130920).

TAPDataType

The TAPDataType element is not explicitly described in the text of the VODataService specification.

The VODataService XML schema describes TAPDataType as follows:.

  • "an abstract parent for the specific data types supported by the Table Access Protocol"

The XML schema for the TAPDataType element defines the following attribute:

Note - the TAPDataType element name reflects the historical situation where the data types were originally defined in the TAP specification. The data type definitions have since been moved to the ADQL specification, but for backward compatibility the XML element name has not been changed.

TAPType

Section 3.5.3 (Table Column Data Types) of the VODataService specification defines the TAPType XML element.

The TAPType element inherits the following attributes from DataType:

The TAPType inherits the following attribute from TAPDataType:

  • size the length of a variable-length value

TAPType defines the following set of allowed values:

  • BOOLEAN
  • SMALLINT
  • INTEGER
  • BIGINT
  • REAL
  • DOUBLE
  • TIMESTAMP
  • CHAR
  • VARCHAR
  • BINARY
  • VARBINARY
  • POINT
  • REGION
  • CLOB
  • BLOB

The specification text describes TAPType as follows :

  • "data types that correspond column types defined in the Table Access Protocol (v1.0) [TAP]"

The explicit reference to version 1.0 of the TAP specification is out of date.

The TAP specification no longer contains the definition of these data types.

The TAPType element name reflects the historical situation where the data types were originally defined in the TAP specification. The data type definitions have since been moved to the ADQL specification, but for compatibility reasons, the XML element name has not been changed.

The definition of TAPType does not provide any further details about the sizes, ranges or content of the data types. It is left to the reader to refer to the TAP (now ADQL) specification for details about the data types.

The text at the end the section refers to a mapping between TAP_SCHEMA types and VOTable types in the TAP specification.

  • "Note that the TAP standard [TAP] defines an explicit mapping between TAP_SCHEMA types and VOTable types."

This mapping is no longer part of the TAP specification.

The definition of TAPType states that string values should be represented by a data type of VARCHAR, the text does not say whether this should use a size or an arraysize attribute.

Note - the TAPType element name reflects the historical situation where the data types were originally defined in the TAP specification. The data type definitions have since been moved to the ADQL specification, but for backward compatibility the XML element name has not been changed.

TAPType =size

Section 3.5.3 (Table Column Data Types) of the VODataService specification defines the size attribute.

Technically, size is an attribute of the abstract TAPDataType parent element, which is then inherited by the TAPType element.

The text describing for the TAPType element describes the size attribute as follows:

  • "The length of the variable-length data type."
  • "In the context of TAP, this attribute is only meaning when the data type is CHAR or BINARY; see discussion below."

This restriction implies that CHAR and BINARY values are not arrays of values and have an inherent 'size' property, which is distinct from the 'arraysize' property.

In the discussion that follows, the VODataService specification cites the following two examples as equivalent:

 
    <dataType xsi:type="vs:VOTableType" arraysize="*"> char </dataType>

    <dataType xsi:type="vs:TAPType"> VARCHAR </dataType>

A third example describes a fixed length string, using the size rather than the arraysize attribute

 
    <dataType xsi:type="vs:TAPType" size="8" > CHAR </dataType>

However, the VODataService specification does not explicitly explain the difference between the folllowing examples:

 
    <dataType xsi:type="vs:TAPType" size="8" > CHAR </dataType>

    <dataType xsi:type="vs:TAPType" arraysize="8" > CHAR </dataType>

The comments in the XML schema for TAPDataType describes the size attribute as follows:

  • "This corresponds to the size Column attribute in the TAP_SCHEMA and can be used with data types that are defined with a length (CHAR, BINARY)."

This establishes a link between the TAPDataType size attribute and the size column in TAP_SCHEMA.

In the currenet version of the TAP specification the corresponding size column is described as :

  • "retained for backwards compatibility to TAP-1.0"

The original text in version 1.0 of the TAP specification describes the size column as follows :

  • "The “size” gives the length of variable length datatypes, for example varchar(256);"

The TAP specification does not link the size column back to TAPDataType in the VODataService specification.

VODataService Table

... TBD

VODataService TableSet

... TBD


VOTable

The VOTable specification defines an XML based serialization format for exchanging tabular data in the VO.

VOTable data types

Section 2.1 (Primitives) of the VOTable specification defines a core set of primitive data types.

The following table descibes the types, their semantic meaning, the corresponding FITS data type and the size in bytes:

Datatype Meaning FITS Bytes
boolean Logical L 1
bit Bit X *
unsignedByte Byte (0 to 255) B 1
short Short Integer I 2
int Integer J 4
long Long integer K 8
char ASCII Character A 1
unicodeChar Unicode Character   2
float Floating point E 4
double Double D 8
floatComplex Float Complex C 8
doubleComplex Double Complex M 16

VOTable serialization

Section 6 (Definitions of Primitive Datatypes) of the VOTable specification describes the BINARY, BINARY2 and TABLEDATA serializations of each of the primative data types.

VOTable =boolean

Case insensitive long form:

  • TRUE FALSE
  • True False
  • true false

Case insensitive short form:

  • T F
  • t f

Numeric, one or zero:

  • 1 0

This results in a slightly different definition of how to serialize boolean values compared to the DALI definition, which uses a defintion from the W3C XML schema specification.

VOTable =bit

Array of bits, padded to fit into bytes.

VOTable =unsignedByte

8 bit (unsigned) integers, 0 to 255.

VOTable =short

16 bit signed integers, -32768 to 32767.

VOTable =int

32 bit signed integers, -2147483648 to 2147483647.

VOTable =long

64 bit signed integers, -9223372036854775808 to 9223372036854775807.

VOTable =float

ANSI/IEEE-754 32-bit floating point numbers.

VOTable =double

ANSI/IEEE-754 64-bit double precision floating point numbers.

VOTable =char

ASCII (7-bit) characters.

VOTable =unicodeChar

The description for the BINARY serialization of unicodeChar defines it as Unicode (UCS-2) fixed width 2-byte characters.

  • "Each Unicode character is represented in the BINARY/BINARY2 serialization by two bytes, using the big-endian UCS-2 encoding (ISO-10646-UCS-2)"

The UCS-2 character set includes all of the characters in the Basic Multilingual Plane (BMP), which contains characters for almost all modern languages.

The description for the TABLEDATA serialization includes an example showing how a unicodeChar that is outside the ASCII character set can be represented in an XML document by using a numeric character reference (NCR).

  • "The representation of a Unicode character in the TABLEDATA serialization follows the XML specifications, and e.g. the Cyrillic uppercase ``Ya'' can be written &#x042F; in UTF-8."

The reference to UTF-8 in this description may be misleading, because a UTF-8 document can contain the multi-byte Cyrillic uppercase ``Ya'' character, Я, as-is, without needing to use a numeric character reference.

The reason for using numeric character references is if the document character set is not able to represent the ``Ya'' character, Я.

As a result, declaring a UTF-8 encoding for a VOTable document containing TABLEDATA data may be problematic,

    <?xml version=“1.0” encoding=“utf-8”?>
as this would mean the VOTable document would be able to contain complex multibyte characters that are beyond the range of the UCS-2 fixed-width character set.

It may be better to specify the character encoding for VOTable documents as UCS-2,

    <?xml version=“1.0” encoding=“ucs-2”?>
This would make the the TABLEDATA serialization equivalent to the BINARY serialization, and require numeric character references for all characters outside the UCS-2 two byte fixed size.

Note - There is a paragraph needed to link this section and the section describing the different MIME types and how they would effect the serialization of unicodeChar strings.

Note - since 2005 it is no longer possible to encode all of the mandatory components defined in the official GB 18030-2005 character set of the People's Republic of China in a fixed width 2 byte character set. Support for the GB 18030-2005 character set is officially required for all software products sold in the PRC.

VOTable =floatComplex

The description for the BINARY serialization of floatComplex defines it as a pair of 32-bit, single precision, floating point numbers.

  • "a sequence of pairs of 32-bit single precision floating point numbers in big-endian order"

The description for the TABLEDATA serialization of floatComplex defines it as a pair of floating point numbers separated by white space.

  • "two representations of a Single Precision Floating Point numbers separated by whitespace, representing the real and imaginary part respectively"

Note that this effectively fixes the delimter for the TABLEDATA serialization to white space, regardless of the delim attribute set in the VODataService description of the source data table.

VOTable =doubleComplex

The description for the BINARY serialization of doubleComplex defines it as a pair of 64-bit, double precision, floating point numbers.

  • "a sequence of pairs of 64-bit double precision floating point numbers in big-endian order"

The description for the TABLEDATA serialization of floatComplex defines it as a pair of floating point numbers separated by white space.

  • "two representations of a Double Precision Floating Point numbers separated by whitespace, representing the real and imaginary part respectively"

Note that this effectively fixes the delimter for the TABLEDATA serialization to white space, regardless of the delim attribute set in the VODataService description of the source data table.

VOTable =xtype

Section 4.3 (Extended Datatype)) of the VOTable specification describes the xtype attribute as bridging the gap between the FITS based primitive VOTableTypes and the data types used to express TAP ADQL database queries and their results.

The VOTable specification does not define a definitive list of standard xtype values.

Section 3.3 (Literal Values) of the DALI specification suggests that services should use a prefix for non-standard xtype values. The DALI specification does define a number of types, including POINT, CIRCLE and POLYGON. However it does not declare a specific list of standard xtype values which do not need prefixes.

The VOTable specification does not explicitly state that the VOTable xtype attribute is related to the TAP_SCHEMA xtype column used in TAP_SCHEMA metadata tables, or the VODataService extendedType attribute that is used in the TAP /tables VOSI response.

VOTable =timestamp

The VOTable specification cites an example of using the xtype attribute to describe a timestamp value.

  • _a UTC date/time string following the ISO-8601 standard (YYYY-MM-DDThh:mm:ss followed by a decimal point and fractions of seconds)"

The VOTable specification does not link to the DALI specification, which has a more detailed description of how timestamp values should be represented using xtype.

VOTable arrays

Section 2.2 of the VOTable specification uses a number of examples to show how a combination of datatype and arraysize attributes can be used to describe arrays of values in the metadata for a FIELD.

Section 5.1 of the VOTable specification describes the TABLEDATA serialization of arrays as follows:

  • "If a cell contains an array of numbers or a complex number, it should be encoded as multiple numbers separated by whitespace. However in the case of character and Unicode strings (declared in the corresponding FIELD as an array of char or unicodeChar datatype), no separator should exist."

It uses the following example to illustrate the difference between arrays of numbers and arrays of characters:

    <TABLE>
      <FIELD name="aString" datatype="char" arraysize="10"/>
      <FIELD name="aShort"  datatype="short"/>
      <FIELD name="varInts" datatype="int"  arraysize="*"/>
      <FIELD name="Floats"  datatype="float"arraysize="3"/>
      <DATA><TABLEDATA>
        <TR> <TD>Apple</TD>  <TD/>       <TD>1 2 4 8 16</TD> <TD>1.62 4.56 3.44</TD> </TR>
        <TR> <TD>Orange</TD> <TD>15</TD> <TD>23 -11 9</TD>   <TD>2.33 4.66 9.53</TD> </TR>
      </TABLEDATA></DATA>
    </TABLE>

VOTable =delim

The VOTable specification does not include anything to describe the delimiter for arrays of values.

VOTable Field

... TBD

VOTable =arraysize

The text of the VOTable specification does not explicitly define the arraysize attribute.

The XML specification for the arraysize attribute does not apply a restriction to the content of the attribute.

    <xs:complexType name="Field">
      ....
      <xs:attribute name="arraysize" type="xs:string"/>
      ....
    </xs:complexType>

The VOTable specification does not link the VOTable arraysize attribute with the DataType arraysize attribute defined in the VODataService specification.

VOTable =arrayDEF

The VOTable XML schema defines the arrayDEF syntax restriction as follows:

    <xs:simpleType  name="arrayDEF">
      <xs:restriction base="xs:token">
        <xs:pattern  value="([0-9]+x)*[0-9]*[*]?(s\W)?"/>
      </xs:restriction>
    </xs:simpleType>

However, the arrayDEF syntax restriction is not used in the definition of the arraysize attribute:

    <xs:complexType name="Field">
      ....
      <xs:attribute name="arraysize" type="xs:string"/>
      ....
    </xs:complexType>

This means that the content of the VOTable arraysize attribute is unrestricted, and may contain any string.

In contrast, the VODataService does restrict the content of the of the DataType arraysize attrribute, with the ArrayShape restriction.

This means it is possible to create a value for arraysize that is valid in VOTable but is not valid in the VOSI /tables response, which uses the ArrayShape syntax restriction defined in the VODataService schema.

The only reference to the arrayDEF syntax restriction in the other VO specifications is a comment in the definition of the ArrayShape in the VODataService schema.

The text of the VOTable specification does not link the arrayDEF string syntax with the ArrayShape string syntax defined in the VODataService schema.

The arrayDEF string syntax is not used anywhere else in VOTable XML schema.

The arrayDEF string syntax is not used in any of the other VO specifications.

Arrays of Variable-Length Strings

Appendix A.3 (Arrays of Variable-Length Strings) refers to the Substring Array convention, described in an appendix of the FITS specification.

The text in this specification suggests that a similar convention could be used in VOTable.

  • "A convention similar to the FITS one could be introduced in VOTable in the arraysize attribute ..."

However, the text does not go beyond suggesting this as a possibility, and does not declare whether this is reccomended practice or not.

Provision for this extension is included in the regular expression for the arrayDEF syntax restiction.

    <xs:simpleType  name="arrayDEF">
      <xs:restriction base="xs:token">
        <xs:pattern  value="([0-9]+x)*[0-9]*[*]?(s\W)?"/>
      </xs:restriction>
    </xs:simpleType>

However, because VOTable arrayDEF and VODataService ArrayShape are not directly linked, adding support for this to VOTable makes it is possible to create a value for arraysize that is valid in VOTable but is not valid in the VOSI /tables response, which uses the ArrayShape syntax restriction defined in the VODataService schema.

MIME type

Section 8 (MIME type) describes the format of MIME types that can be used to describe a VOTable document.

The text includes the following set of examples:

  • text/xml
  • text/xml; charset="iso-8859-1"
  • application/x-votable+xml
  • application/x-votable+xml; serialization=tabledata
  • application/x-votable+xml; serialization=TABLEDATA; charset=iso-8859-1

The text in this section states that if the optional charset parameter is not supplied, then default of US-ASCII is assumed. This may conflict with the discussion of the unicodeChar data type, which assumes that the default charset is UTF-8.

The specification does not go into details about how changing the MIME type charset could change the way that strings of unicodeChar are serialized.


DALI

The DALI specification defines the base web service interface common to all Data Access Layer (DAL) services.

DALI tables

Section 2.6 (VOSI-tables) of the DALI specification refers to the VOSI-tables web resource, defined by the Grid and Web Wervices working group.

DALI types

Section 3.3 (Literal values) of the DALI specification specifies how a number of data types should be expressed by DALI services.

Note that although the following data types and the rules for representing them may be used for literal values in parameters passed to DALI services, these definitions are also referred to by other VO specifications to describe how to represent data values in VOTable results, VOSI /tables responses and TAP_SCHEMA metadata tables.

This may mean that 'Literal values' might not be the most appropriate title for this section.

DALI Numbers

Section 3.3.1 (Numbers) refers to the VOTable specification.

  • "Integer and real numbers must be represented in a manner consistent with the specification for numbers in VOTable"

However, it is not clear which VOTable serializations, BINARY, BINARY2 or TABLEDATA apply in which situations.

DALI Boolean

Section 3.3.2 (Boolean) refers to part 2 of the W3C XML schema specification.

This results in a slightly different definition of how boolean values should be expressed compared to the definition given in the VOTable specification.

DALI Timestamp

Section 3.3.3 (Timestamp) defines three representations for date and time values as follows:

  • For astronomical values the time and date format follows the convention established for FITS, mandating UTC, but omitting the timzone in the representation, YYYY-MM-DD['T'hh:mm:ss[.SSS]].

  • For civil values, relating to events at locations on the Earth, the format includes an optional 'Z' to to explicitly specify the UTC time zone, YYYY-MM-DD['T'hh:mm:ss[.SSS]['Z']].

  • Julian Date (JD) or Modified Julian Date (MJD) values should follow the rules for double precision numbers.

The specification also describes how to represent timestamp values in the metadata for a VOTable FIELD using timestamp for the [#VoTableXtype][xtype] attrbute.

    <FIELD datatype="char", arraysize="*", xtype="timestamp">


Note - the specification text includes the following:

  • "Julian Date (JD) or Modified Julian Date (MJD), these follow the rules for double precision numbers above"

However, apart from the reference to the VOTable in the section on numeric values there are no explicit rules for double precision numbers in the preceeding text.

DALI Interval

Section 3.3.4 (Interval) defines how to represent numeric intervals as pairs of numeric values.

The specification describes how to represent numeric intervals in the metadata for a VOTable FIELD, by setting arraysize to 2 and using interval for the xtype attrbute.

    <FIELD datatype="short",  arraysize="2", xtype="interval">
    <FIELD datatype="int",    arraysize="2", xtype="interval">
    <FIELD datatype="long",   arraysize="2", xtype="interval">
    ....
    <FIELD datatype="float",  arraysize="2", xtype="interval">
    <FIELD datatype="double", arraysize="2", xtype="interval">

All of the examples shown in the specification text use space as the delimiter. However, the specification does not explicity define a delimiter, nor does it refer to the delim attribute defined in VODataService.

DALI Time Interval (proposed)

PROPOSED

The version (1.1) of the DALI specification does not describe how to represent time intervals.

Given that we have already used timestamp and interval to cover the separate cases, we will need to define a new xtype value to describe an interval of timestamps.

We propose time-interval as the xtype value to represent an interval of timestamps.

Given that timestamp is serialized as an array of characters, and given that the geometric types, Point, Circle, and Polygon are represented as arrays of floating point numbers. Them following a similar patten, an Interval of Timestamps could be represented as a space delimited sequence of two Timestamps:

    <FIELD datatype="char", arraysize="2x*", xtype="time-interval">

    1970-01-01T00:00:00.000Z 2017-08-16T17:12:54.621Z

If the accuracy of the timestamps is known, then the array size of the Timestamps can be fixed:

    <FIELD datatype="char", arraysize="2x24", xtype="time-interval">

    1970-01-01T00:00:00.000Z 2017-08-16T17:12:54.621Z

DALI Point

Section 3.3.5 (Point) of the DALI specification defines how to represent a geometric point as an array of two floating point numbers.

    <FIELD ... datatype="float"  arraysize="2" xtype="point">
    ....
    <FIELD ... datatype="double" arraysize="2" xtype="point">

The specification text states that the usual representation is to use longitude and latitude values in spherical coordinates.

The text also implies that other coordinate systems can be used:

  • "although they are usually longitude and latitude values in spherical coordinates this is specified in the coordinate metadata and not in the values"

However, the text does not show how to a specifiy a different coordinate system.

The example shown in the specification text uses space as the delimiter:

    12.3 45.6

The specification does not explicity define a delimiter, nor does it refer to the delim attribute defined in VODataService.

DALI Circle

Section 3.3.6 (Circle) of the DALI specification defines how to represent a circle as an array of three floating point numbers.

    <FIELD ... datatype="float"  arraysize="3" xtype="circle">
    ...
    <FIELD ... datatype="double" arraysize="3" xtype="circle">

The specification text implies that the usual representation is to use longitude, latitude and radius in spherical coordinates. However, the text does not show how to a specifiy a different coordinate system.

The example shown in the specification text uses space as the delimiter:

    12.3 45.6 0.5

The specification does not explicity define a delimiter, nor does it refer to the delim attribute defined in VODataService.


Note - the text in the current version (1.1) repeats the same range restriction twice.

  • "For circles in a spherical coordinate system ... longitude values must fall within [0,360], latitude values within [-90,90], and radius values in (0,180]"

  • "In spherical coordinates, all longitude values must fall within [0,360] and all latitude values within [-90,90]"

DALI Polygon

Section 3.3.7 (Polygon) of the DALI specification defines how to represent a polygon as an array of floating point numbers.

    <FIELD ... datatype="float"  arraysize="*" xtype="polygon">
    ....
    <FIELD ... datatype="double" arraysize="*" xtype="polygon">

Note that the polygon is modelled as a one-dimensional array of numbers, rather than a two-dimensional array of pairs of numbers.

    <FIELD ... datatype="float"  arraysize="2,*" xtype="polygon">

The text of the specification implies that the usual representation is to use spherical coordinates. However, the text does not show how to a specifiy a different coordinate system.

The example shown in the specification text uses space as the delimiter:

    10.0 10.0 10.2 10.0 10.2 10.2 10.0 10.2

However, the specification does not explicity define a delimiter, nor does it refer to the delim attribute defined in VODataService.

DALI RESPONSEFORMAT

Section 3.4.3 (RESPONSEFORMAT) of the DALI specification describes the RESPONSEFORMAT parameter that enables a client to request different MIME types and response formats from a DALI service.

... TDB

Note - need to cross reference this with charset from the VOTable MIME types.


VOSI

The VOSI specification defines a number of web service interface methods and resources that are common to all of the VO services.

VOSI =/tables

Section 3.3 (Table metadata) of the VOSI specification describes a resource that describes the content of database tables accessible from a VO service.

The VOSI Tables XML schema imports the Table and TableSet elements from the VODataService specification.

The VOSI Tables XML schema inherits both the VOTableType and TAPType elements from the VODataService specification, along with their attributes.

The VOTableType and TAPType elements inherit the following attributes from DataType:

The TAPType element inherits the following attribute from #TAPDataType:

Note - the examples given in the text of the VOSI specification use vs:TAP rather than vs:TAPType for the xsi:type attribute.

    <column>
      <name>cfhtlsID </name>
      <dataType xsi:type="vs:TAP" size="30">adql:VARCHAR</dataType>
    </column>

Note - the examples given in the text of the VOSI specification use a prefix for the adql:VARCHAR data type, which is not a valid value for a TAPType data type.


TAP

The TAP specification describes the Table Access Protocol web service.

TAP =/tables

Section 2.5 (/tables) of the TAP specification describes the /tables resource.

The text in the current version (WD-TAP-1.1-20170707) of the specification refers to the VODataService specification for details of the /tables resource content. However, the content, and more importantly, the behaviour, of the VOSI /tables resource is described in Section 3.3 (Table metadata) of the VOSI specification. Which then imports the Table and TableSet XML elements from the VODataService specification. So should the primary reference be to VOSI /tables resource rather than VODataService.

The text of the specification recommends using VOTableType rather than TAPType, but it does not explicitly exclude the use of TAPType.

  • "The use of VOTableType (rather than TAPType) in the VOSI-tables output is recommended because the values map directly"
  • "TAPType may be used when VOTableType does not provide a suitable alternative"

This is in contrast to the text of the VOSI specification, which includes a number of examples using a TAP adql:VARCHAR data type:

    <column>
      <name>cfhtlsID </name>
      <dataType xsi:type="vs:TAP" size="30">adql:VARCHAR</dataType>
    </column>

In addition, the text of the VODataService specification cites the following two examples as equivalent:

 
    <dataType xsi:type="vs:VOTableType" arraysize="*"> char </dataType>
    ....
    <dataType xsi:type="vs:TAPType"> VARCHAR </dataType>
and a third example describes a fixed length string, using size rather than arraysize:
 
    <dataType xsi:type="vs:TAPType" size="8" > CHAR </dataType>
Which conflicts with both the recomendation to use VOTableType rather than TAPType in TAP /tables and the advisory regarding the planned obsolescence of the TAP_SCHEMA size column in future versions of the TAP specification.

TAP_SCHEMA

Section 4 (TAP_SCHEMA) of the TAP specification describes the TAP_SCHEMA metadata tables.

The specification text states that TAP_SCHEMA amd VOSI /tables should be equivalent.

  • "The VOSI tables resource provides the same metadata as the TAP_SCHEMA but in a rigorously controlled format"
  • "the information in the TAP_SCHEMA is equivalent to that defined by the VODataService"

However, as shown below there are a number of inconsitencies between the various specifications which mean that this is not always the case.

TAP_SCHEMA.schema

Section 4.1 (Schema) of the TAP specification describes the content and structure of the TAP_SCHEMA.schema metadata table.

TAP_SCHEMA.tables

Section 4.2 (Tables) of the TAP specification describes the content and structure of the TAP_SCHEMA.tables metadata table.

Note -The definition of TAP_SCHEMA.tables does not state that values in the schema_name column must match the corresponding schema_name column in the TAP_SCHEMA.schema table.

TAP_SCHEMA.columns

Section 4.3 (Columns) of the TAP specification describes the content and structure of the TAP_SCHEMA.columns metadata table.

column name datatype arraysize xtype not-null
table_name char * null true
column_name char * null true
datatype char * null true
arraysize char * null false
xtype char * null false
"size" int 1 null false
description char * null false
utype char * null false
unit char * null false
ucd char * null false
indexed boolean 1 null true
principal boolean 1 null true
std boolean 1 null true
column_index int 1 null false

The specification text explains that TAP_SCHEMA uses a combination of datatype, arraysize and xtype to descibe the type of a database column.

TAP_SCHEMA.columns.datatype

The text of the specification restricts TAP_SCHEMA.columns.datatype to use the data types defined in the VOTable specification.

  • "The allowed values for datatype ... are specified in VOTable"

Section 2.1 (Primitives) of the VOTable specification defines the following data types:

Datatype Meaning FITS Bytes
boolean Logical L 1
bit Bit X *
unsignedByte Byte (0 to 255) B 1
short Short Integer I 2
int Integer J 4
long Long integer K 8
char ASCII Character A 1
unicodeChar Unicode Character   2
float Floating point E 4
double Double D 8
floatComplex Float Complex C 8
doubleComplex Double Complex M 16

In contrast, the /tables resource defined in Section 2.5 (/tables) of the TAP specification is based on the VOSI /tables resource, which in turn, uses the Table and TableSet data type elements from the VODataService specification.

The VOTableType defines a similar set of data types to VOTable, albeit defined in a separate list in a separate specification:

  • boolean
  • bit
  • unsignedByte
  • short
  • int
  • long
  • char
  • unicodeChar
  • float
  • double
  • floatComplex
  • doubleComplex

However, the VOSI specification also allows the TAPType data type to be used to describe database columns. The TAPType data type defines a different set of data types, based on the data type stored inside the database, rather than an external serialization of the data:

  • BOOLEAN
  • SMALLINT
  • INTEGER
  • BIGINT
  • REAL
  • DOUBLE
  • TIMESTAMP
  • CHAR
  • VARCHAR
  • BINARY
  • VARBINARY
  • POINT
  • REGION
  • CLOB
  • BLOB

This second set of data types means that the following example, based on examples given in both the VOSI and the VODataService specifications, would be valid in the /tables response, but the same data type would not be valid in the corresponding TAP_SCHEMA datatype.

    <column>
      <name>cfhtlsID </name>
      <dataType xsi:type="vs:TapType" size="30">VARCHAR</dataType>
    </column>

TBD - link to the actual data in live TAP services.

TAP_SCHEMA.columns.arraysize

The text of the specification describes the arraysize column as "the length of variable length datatypes".

  • "The arraysize column gives the length of variable length datatypes using the VOTable array shape syntax."

This does not explicitly state whether this is the number of elements in the array, or the size (in bytes) of the array.

The example given is for an array of characters, where the size in bytes is equal to the number of elements.

  • "a database column of type varchar(256) would be described with datatype 'char' and arraysize '256*'"

Based on examples given in some of the other VO specifications it is possible to infer that this is the number of elements and not the size in bytes. However, the specification could make this clearer by explicitly stating that it is the "number of elements in the array".

The specification text explicitly refers to the VOTable specification for a definition of the arraysize syntax.

  • "... the syntax for arraysize are specified in VOTable (Ochsenbein and Williams et al., 2013)"

The XML schema for the VOTable specification defines the arrayDEF syntax restriction which includes support for the the FTS Substring Array convention:

    <xs:simpleType  name="arrayDEF">
      <xs:restriction base="xs:token">
        <xs:pattern  value="([0-9]+x)*[0-9]*[*]?(s\W)?"/>
      </xs:restriction>
    </xs:simpleType>

However, the arrayDEF restriction is not actually used in the definition of the VOTable Field arraysize attribute.

    <xs:complexType name="Field">
      ....
      <xs:attribute name="arraysize" type="xs:string"/>
      ....
    </xs:complexType>

The VODataService ArrayShape syntax restriction, used in both the VOTableType and the TAPType elements defined in the VODataService XML schema does not include support for the the FTS Substring Array convention:

    <xs:simpleType  name="ArrayShape">
      <xs:restriction base="xs:token">
        <xs:pattern  value="([0-9]+x)*[0-9]*[*]?"/>
      </xs:restriction>
    </xs:simpleType>

    <xs:complexType name="VOTableType">
      <xs:simpleContent>
        <xs:restriction base="vs:TableDataType">
          ....
          <xs:attribute name="arraysize" type="vs:ArrayShape" default="1"/>
          ....
        </xs:restriction>
      </xs:simpleContent>
   </xs:complexType>

    <xs:complexType name="TAPType">
      ....
      <xs:simpleContent>
        <xs:restriction base="vs:TAPDataType">
          ....
          <xs:attribute name="arraysize" type="vs:ArrayShape" default="1"/>
          ....
        </xs:restriction>
      </xs:simpleContent>
    </xs:complexType>

This means that whether you use the 'no restriction' definition of the VOTable arraysize attribute, or the VOTable arrayDEF restriction, it is possible to construct a string that would be valid in TAP_SCHEMA arraysize but would not be valid in the VODataService ArrayShape restriction used in the corresponding TAP /tables VOSI response.

TAP_SCHEMA.columns.xtype

The text of the specification refers to the types defined in the DALI specification.

  • "Values for xtype are not restricted per se but implementors should use standard values such as those defined in DALI ... before inventing new xtype(s)."

However, the specification does not state that the TAP_SCHEMA xtype column is related to the VOTable xtype attribute used in VOTable results, or the VODataService extendedType attribute that is used in the TAP /tables VOSI response.

TAP_SCHEMA.columns.size

The text of the specification states that the size column is kept for backwards compatibility and will be removed in the next major version of the TAP specification.


ADQL

The ADQL specification describes the Astronomy Query Language.

ADQL data types

Edit | Attach | Watch | Print version | History: r19 < r18 < r17 < r16 < r15 | Backlinks | Raw View | Raw edit | More topic actions
Topic revision: r19 - 2017-09-12 - DaveMorris
 
This site is powered by the TWiki collaboration platform Powered by Perl This site is powered by the TWiki collaboration platformCopyright © 2008-2024 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback