Data Access Layer Interface

DALI is a base set of requirements and rules that all DAL services will follow. The goal is not to define what service must do, but rather to specify various common service resources or operations so t if a service specification includes a common operation it will do so by referencing the DALI specification. This will make it much easier for common service features to be defined once and implemented the same way wherever they are needed.

WD-DALI-1.0 Working Group Review: 2012-12-13 to 2013-01-10

The WD is available at http://www.ivoa.net/Documents/DALI/20121210/index.html

WG review comments

Some from Mark

Fairly minor comments/suggestions:

Sec 2.1: there are a couple of redundant xmlns declarations in the example job representation (xsi, xlink) - not wrong, but might be clearer to omit them.
Sec 2.1 and 2.2: "A concrete DAL service specification will specify one or more (a)synchronous job-list resources". "one or more" should be "zero or more". Also, "job-list resources" may not be the clearest terminology, especially in the sync case. Would "job submission endpoints" be better?
Sec 2.3: "content is both usable for machines and humans" should read "content is usable for both machines and humans".
Sec 2.3: "All such elements MUST have ... an about attribute with a reference pointing to the element itself". I'm guessing that this is a standard microformat/RDFa idiom, but it looks weird to me. Add a short explanation of what the point of that is?
Sec 2.3: "emoty" -> "empty".
Sec 2.6: "A concrete DAL service specification will specify if the /tables resource is mandatory or optional". This doesn't cover the case where it's not allowed (e.g. no tables involved). Replace "mandatory or optional" with "permitted or required"?
Sec 3.2.2.: "If the client does not specify a value for the VERSION, the service must interpret the request using the rules and semantics of the latest recommended version suported by the service". I'd argue for this "must" to be a "should". Apart from anything else, as it stands it requires services to modify their behaviour immediately a PR turns into a REC which may not be convenient. There may in any case be other reasons to prefer a non-latest version as the standard default.
Sec 3.2.3: "Only in the case of TAP-1.0 are FORMAT and RESPONSEFORMAT equivalent" - is it intentional to preclude this equivalence in any other future standard (e.g. TAP-1.1)? Otherwise reword, e.g. "FORMAT and RESPONSEFORMAT have the same sense in TAP 1.0, but this is not generally the case" (not sure if that is ideal, I must admit the FORMAT/RESPONSEFORMAT distinction is a bit lost on me).
Sec 3.2.5: "Services that implement table upload must support the param scheme ..." - I think the word "table" should be removed.
Sec 3.2.5: "(see for details)" - missing ref.
Sec 4.1: "to chose suitable names" -> "to choose suitable names"
Sec 4.4.1: "If an overflow occurs (result exceeds MAXREC) ...". I suggest rewording this to explicitly allow flagging of an overflow condition if the result is known to contain at least MAXREC records (i.e. may or may not be an actual overflow). Two reasons: first, it means that no special interpretation is required for the MAXREC=0 case, and second, it may make implementation easier. There may be arguments against though, so I'm not insisting.
Sec 4.4.3: "standardVerrsion" -> "standardVersion"

-- MarkTaylor - 2013-01-01

Discussion of the 2012-02 draft

The draft is at http://www.ivoa.net/Documents/DALI/20120202/WD-DALI-1.0-20120202.pdf

Three points from Markus, 2012-04-17

Feel free to hack your comments into the text -- MarkusDemleitner - 17 Apr 2012

On 3.1.3 Multiple Values, 3.1.4 Qualifiers

Well, I'm still opposed to the whole idea of syntax in parameters. Here's why:

(1) Rich parameter syntax hasn't worked well for SSAP -- most services either don't interpret the syntax at all or at least not nearly consistently. Care to see how many support teff_min and teff_max rather than doing the slash syntax on Teff? Also, it's at least very hard to figure out what part of the "PQL" syntax a given parameter supports.

(2) Enumerations are a fairly rare special case. Many interesting values people want to query against are real values, and you'd much rather have ranges than enumerations. So, do ranges go into DALI as well?

(3) If we think enumerations are actually that valuable, they work fine by just repeating parameter names without any syntax at all, with simple HTTP quoting rules sufficing. [Btw I've not found a reference that said in HTTP URLs repeated parameters were equivalent to commas in values]

(4) If we still think we want enumerations, we need to provide quoting rules, i.e., you need to say how you'll encode the list of strings (python syntax) ["23,3", "this, and not something else"]. Welcome to escaping hell. Suggesting "ah, percent-encode the commas then" is, I think inviting trouble since getting the decoding steps right will evolve to be a major challenge (and you'll have to percent-encode embedded percents, too).

(5) Suppose you still want syntax, I'm sure people will get confused by whitespace if these things are entered into UIs: If I write "folk, classical" in my form, and the application sends folk,%20classical", what's supposed to happen?

(6) Defining a classifier syntax with some suggestion it might be used to specify coordinate systems in some syntax not defined any closer is inviting horrible confusion. People will be tempted to use this stuff for nothing or everything, and there will be lots of conflicting syntaxes that few, if any, clients and servers implement correctly. Writing a "common parser" for these values will be effectively impossible, and this IMHO that's even less DALI stuff than enumerations.

(7) If you still believe we want qualifiers, at least provide clear syntax that allows (a) embedded semicolons in values and qualifiers and (b) that allows parsers to ignore what they don't understand and maybe give some structured representation of the qualifier(s) to higher levels.

In sum: I'm sure we just should strike sections 3.1.3 and 3.1.4. Maybe a recommendation for parameter naming would be nice ("implement ranges by appending _min and _max to the parameter names, and interpret missing range limits when another one is present as open ranges"). I really don't see how we can sensibly say anything about declaring coordinate systems generically.

If consensus cannot be reached on this, we need to (a) define sane escaping rules and (b) define some mechanism how clients and users can discover what kind of syntax is supported on a given parameter.

On 3.2.5 UPLOAD

If the spec remains as it is, we need a better specification of syntax and semantics. Since we're changing the definition from what TAP did anyway (TAP had a ; to separate pairs), I'd argue that's fair.

(1) We should make clear that tablename must be a simple, C-like identifier (rather than, e.g. a delimited identifier that's not ruled out in TAP). I'd then prefer whitespace to a comma to seperate name from URI, but I'd not fight.

(2) We should state what happens if on an async service UPLOAD is re-posted -- do the new pairs get added or replaced?

(3) "if the service refuses to accept the entire table, it must respond with an error" -- we should make clear here that the error need to become immediately visible on POSTing? In async services, that error might only become visible at execution time [e.g., in DaCHS, e.g., uploaded tables are temporary in the DB, i.e., upload and execution must take place within a single connection].

(4) The inline-upload solution with param: is not a joy to implement, and I'd contend it's not exactly pretty either. For one, you need to inspect all UPLOAD parameters to be able to identify a given MIME part as an upload (rather than a parameter). This distinction is important since usually, you will store parameters in the database, which is something you may not want to do with potentially large inline uploads. Plus, it's nice if you can process the request body in a stream, which isn't possible if you need to know the whole thing before deciding what to do with a parameter.

In the end, I'd much rather we said on UPLOAD something along the lines of:

UPLOAD -- a request to a service or POSTing to parameters may carry one or more uploads. In this case, the request body must be a multipart/form-data document. [Currently, this is only true with inline uploads; reference HTML REC, chapter 17 here]. The control name of a part containing an upload must always be UPLOAD. The part furthermore contains a header X-Upload-To: specifying the table identifier the upload should use [defined somewhere else to match [A-Za-z_][A-Za-z_0-9]*]. The table content is either defined in an X-Source-URI: header (in which case the part has empty content), or as the content of the part. In that case, the client MUST transfer a content-type. Concrete protocols SHOULD allow VOTable uploads with a MIME type of application/x-votable+xml (or whatever) and are encouraged to conclusively enumerate the allowed upload formats for interoperability.

This isn't well worked out; I'd provide better prose if people agree something like this is where we want to go.

4.4.3 Additional Information

Unfortunately, the one thing on most scientist's minds is citations, citations, citations. To accomodate this, having VO clients automatically generate reference lists for data sets would be useful. SSAP can already do this on a row level; for more generic protocols, we won't reach row level. Still, we can do better than we currently are. I suggest to add in 4.4.3:

Services SHOULD include INFO elements with a name "source" as children of the top-level RESOURCE element. The value attribute is set to a bibcode that should be referenced when the data contained contributed to a published result. The INFO element's content may be a formatted reference. It is explicitely allowed to include more than one such INFO element, though services are advised to exercise moderation.
This information should be used by clients to allow the automatic generation of reference lists, usually by resolving the bibcodes at ADS.

Discussion Topics for the Pune2011 interop

1. Standard resources: sync, async, availability, capabilities, tables

- these are in fact 5 capabilities, but in future we leave the actual names free (to be specified in the VOSI-capabilities + registry record)

TBD - value in fixing the resource names for the VOSI resources themselves and only leaving the 1+ DAL service resources free to be named?

2. Standard job parameters

REQUEST, VERSION, FORMAT, MAXREC, RUNID

3. rules for literal values

a. dates: one variant of ISO8601 only

b. ranges: does this actually belong in PQL?

c. lists: agreed to follow the HTTP spec, which says that multiple occurences of a parameter and a single occurence with comma-separated list of values is equivalent

4. Can individual params (here or in PQL) define something that conflicts with 3? e.g. UPLOAD violates 3c

5. overflow in VOTable - as specified in TAP

6. UPLOAD: table upload (re: object-list query science use case)?

a. the UPLOAD param specified here (as in TAP)

b. how to reference the uploaded table(s) specified in PQL? TAP or ADQL? (next revision)

7. error responses

a. VOTables error document

b. when are HTTP status codes (and text/plain) appropriate?