Units associated with literals - ADQL considerations.
Comments supplied by
JeffLusted. These will be incorporated as appropriated (modified where necessary) in the Units Working Draft.
For queries it is important to avoid ambiguity. For example, km/s (or
km / s) would on their own be difficult because the forward slash is
ambiguous wherever expressions routinely contain arithmetic. The same
is true of + - * % and potential bit operators within a query language
(~ ^ &). The use of a full stop is also awkward where
dot-qualification is employed: database.schema.table or where columns
are aliased such as a.ra .
Care must be taken to consider the possibility of a unit name (or part
of a unit name) being the same as a column or table name, or of a
reserved word, although the latter is more easy to control.
None of this is impossible to overcome. I think the easiest way is to
have some standard syntactic marker. For example:
- 40 [m+2]
- 40 ?m+2
- 40 u:m+2
In fact, a number of plausible alternatives could be supported. If
then a parser complained of ambiguity, one of the alternatives could
be used.
The presence or absence of white space is probably immaterial: in most
languages m + 2 is the same as m+2, so attempting to overcome this is
a misnomer.
Units associated with Columns or Expressions involving
Columns. I assume a column has a UCD and units associated. Therefore,
either a column value has a unit or is unit-less. It is only the
unit-less ones which are problematic; see later.
For expressions, automatic divining of units is a non-starter, in my
opinion. Take, for example:
- X = expressionA * sqrt( expressionB) / expressionC
where each of the individual expressions could be arbitrarily
complex. Even if the individual expressions were amenable to divining
a unit from their composed columns, the overall effect is still
problematic. I would say it was an open problem, or undecidable (at
least by a machine).
Solution. There is a CAST operator in SQL. We adapt this to cast a
value into some desired unit. If the value has no unit (eg: an
arbitrarily complex expression), then we are simply assigning a unit
to a value. If, on the other hand, the value has a defined unit then
the cast implies some conversion where they disagree. A parser could
vet this for correctness and invoke a library function under the
covers to effect the conversion.
One recommendation. If the above sounds sensible, then it might be a
good rule to impose: that the select list (in a query) of column
values/expressions always specifies units. These could default if they
are columns where units are known, but otherwise a parser could
insist, or at least issue warnings. The units could then form part of
the results VOTable.