DALI xtypes (v"> Use Cases for new DALI xtypes (value serialisation)
More General 2-D Spatial Constructs
The following use cases were collected and discussed during the IVOA and ADASS meetings in Nov 2018. They are primarily motivated by the fact that
ADQL-2.1 needs to either define a standard or refer to another standard for tbe meaning of the region function.
(i) coord frame and flavor varying
In EPN data model and existing
EpnTAP services, the coordinate frame and flavor vary from instance to instance (row to row) but using polygon only.
In general, there is a desire to concatenate query results from different services (catalogues) and make a single VOTable (for example) without having to perform complex transformations immediately.
Proposed Solution: If you think about a position as a value (
DALI point) and a coordinate system (metadata) then this is not a new datatype buy part of the
STC data model; this reasoning extends to values that are circles or polygons. The solution for table serialisation is to use two columns (one with the varying values, one with varying metadata) and use VO-DML +
STC data model to define frame-varying positions (eg) as that pair of columns. Since non-frame varying tables should also include the VO-DML +
STC metadata but frames as constants, this is not "more work" in the one case as "do what we should be dong anyway".
(ii) polymorphism
The
ObsCore-1.1 data model specifies a legacy
datatype="char" arraysize='*" xtype="adql:REGION"
VOTable/tap_schema data type for the s_region column. Neither the xtype nor the serialisation format are standardised except in a non-normative section of TAP-1.0.
ESO has mix of polygons and positions with the latter used when the extent of the data coverage is unknown. MAST, CADC, and ESAC have a mix of polygons and circles in spatial coverage of HST observations. In the CADC implementation, the circle values are converted to polygons (6-8 vertices) to support querying and those polygons are returned in the output in some cases (specifically, caom2.Plane.position_bounds is a
datatype="double" arraysize="*" xtype="polygon"
and returns the polygons while ivoa.ObsCore.s_region returns the original mix of circles and polygons).
Proposed Solution: The POS parameter in SIA-2.0 defines a subset of
STC-S that only deals with polymorphism. We can perform a normal refactoring and pull that up to
DALI-1.2 as-is and formally define an xtype. The POS parameter in
SODA is defined as a polymorphic type but equivalent to
DALI (circle and polygon) plus a range type.
xtype="shape" datatype="char" arraysize="*" : polymorphism
serialisation: label +
DALI point|circle|polygon (as in SIA and
SODA)
Complication: In SIA-2.0, the example in section 3.1.2 is wrong (from a WD) and needs an erratum because it mis-uses xtypes; an erratum is needed to remove them from the example... could it say "now you can use shape"? The text in section 2.1.1 still includes a sentence about polygon winding that pre-dates decisions for
DALI; author (
PatrickDowler) considers that an editorial mistake to be fixed in the erratum.
Open question: should we define xtype="range" as datatype="double" arraysize="4" xtype="range" for consistency of defining shape? It is useful as an input value (could be parameter or input table) but probably not so much as a persisted values.
(iii) disjoint shapes
A large fraction of mosaic camera data has gaps between parts. At CADC we provide a convex hull around all the parts but this can be a poor approximation in some cases.
Although it is not a disjoint shape per se, the all-sky region is not supported by
DALI polygon.
Analysis: mathematical simplicity: union always adds points to the set; holes: outer (on-the-left ccw) polygon intersecting an inner (cw) polygon; each such inner/cw polygon punches a hole in the outer polygon
Proposed Solution (base): The
DALI polygon is defined as a simple (single-loop) polygon and is serialised as a double array (eg in votable). Multiple simple loops could be serialised together with a special value that can be unambiguously recognised (w.r.t. coordinate values). As a side effect, all-sky can be easily written as a 2-polygon value where the second polygon is simply the first one flipped. For those people who really need a butterfly polygon, this is just two triangles with a common vertex. In either case, one has to allow for the component polygons in a multipolygon to touch. Should component polygons be allowed to overlap? (consensus: no - the producer should combine them into a simple polygon)
There are use cases for "multi-interval": energy and time support of an observation. It is already possible to serialise an array of interval and an array of circle, but neither of these have the set semantics of union.
For the special value, VOTable allows use of the string
NaN in float or double arrays so we could use that (but see below).
xtype="multipolygon" datatype="double" arraysize="*"
xtype="multiinterval" datatype="double" arraysize="*"
simplest serialisation: {polygon}
NaN {polygon} ...
allsky: S+N hemispheres == 0 120 0 240 0 0
NaN 0 0 240 0 120 0
To be discusssed further: yes: that is kind of ugly... but more importantly using a valid number as a separator is a lot like things in the past that cause a lot of bugs as mistakes go undetected. To defend against errors that introduce
NaNs and make a different valid multipolygon we could add the number of polygons at the end of the array (unlikely but possible with even number of extra
NaNs and enough vertices in between). What we don't like about that extra number is that a serialised polygon is not a valid multipolygon... there are plenty of ways to generate an incorrect polygon that is still valid but wrong/different. If we had thought this through from the start, we would have probably included the vertex count at the start or end of the simple polygon, but we didn't.
Proposed Solution (extended): Above we have independent schemes to support polymorphism and disjoint shapes/holes; we can combine them to finally define region for use in
ADQL and
ObsCore:
xtype="region" datatype="char" arraysize="*": polymorphism + union and intersection operators
serialisation, operator style: {shape} {operator} {shape} ... ??
serialistion, function style: {operator} ( {shape} {shape} ... )
Operator style is more like the base solution while function style is like the informative TAP-1.0 BNF and thus would mean we are standardising that usage (minus the coord metadata).
(iv) polygons with holes
This is really down in the details and although there are no concrete use cases that anyone put forward for this scenario, it is feasible (CADC have some cases we just ignore). If solutions to theother use cases can also support this without extra complexity they get extra points.
Proposed Solution: Now that we have a defined winding direction to define in vs out, we can make holes using intersection and thus do not need a negation/not operator.
The net effect of all the above stuff is that there are 4 kinds of values and the serialisations are built up from the simplest (lower left) to most general (upper right). We would introduce 4 new xtypes in
DALI-1.2 to standardise existing usage (
ADQL, SIA,
SODA,
ObsCore).
|
simple values |
complex values via operators |
polymorhic |
xtype="shape" |
xtype="region" |
non-polymorhpic |
xtype="interval" xtype="point" xtype="circle" xtype="polygon" |
xtype="multiinterval" xtype="multipolygon" |
Winding direction
Felix & Alberto recommend a change (erratum) to defining
DALI polygon in terms of "on-the-left" instead of CCW, which could also go into
DALI-1.2 -- it means the same thing but is easier to grok/visualise.
MOC
A multi-order coverage maps (moc) could in principle be serialised as a value in a VOTable cell. To do so, there must be a defined serialisation that fits in the current VOTable framework and a defined xtype that signals that serialisation can be used to interpret the values as mocs. The Apps WG is working on making a suggested ascii serialisation part of the MOC standard, so that would be needed here is to define the moc xtype/usage and refer to the MOC standard... probably
datatype="char" arraysize="*" xtype="moc"
as the suggested serialisation uses non-numeric characters as separators.