Jumps: ObsDMCoreComponents :: VOResource :: VODataService
Meetings: InterOpMay2010
TAP VOResource Extension Schema
Step 1 3/4: Pre-Draft (2011-01-11)
We're running on a tight schedule: a working draft should be done by
2011-01. So, I'd suggest all interested parties just quickly comment in-line
in the following pre-draft. Please reply in paragraphs, and a date and
initials at the end would help; everything uninitialled would then come
from the original pre-draft. -- MD 2011-11-11
Here's a first shot at defining the TAP capability element as an
instance document with interspersed comments. From an internally
circulated attempt, I've moved things from attribute values to elements,
since that, on second deliberation, seemed more in line with the general
VOResource style.
I've also added resource limits since they seemed easy, and I've added
user defined functions in some very "light" form since I think they are
<?xml version="1.0"?>
<capability xmlns:tap="http://www.ivoa.net/xml/TAP/v1.0" xmlns:vr="http://www.ivoa.net/xml/VOResource/v1.0" xmlns:vs="http://www.ivoa.net/xml/VODataService/v1.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" standardID="ivo://ivoa.net/std/TAP" xsi:schemaLocation="http://www.ivoa.net/xml/TAP/v1.0 http://vo.ari.uni-heidelberg.de/docs/schemata/TAP-v1.0.xsd http://www.ivoa.net/xml/VOResource/v1.0 http://vo.ari.uni-heidelberg.de/docs/schemata/VOResource-v1.0.xsd http://www.ivoa.net/xml/VODataService/v1.0 http://vo.ari.uni-heidelberg.de/docs/schemata/VODataService-v1.0.xsd">
<interface role="std" xsi:type="vs:ParamHTTP">
<accessURL use="base">http://localhost:8080/__system__/tap/run/tap</accessURL>
Up to here, it's generic. I've made up the namespace in parallel to
the S*APs.
<dataModel ivo-id="ivo://ivoa.net/std/ObsCore-1.0">ObsCore 1.0</dataModel>
dataModels have a "name" in the text content (intended for labels and
such, intended for humans), and an ivo-id. I'd tend to make the ivo-id
<language ivo-id="ivo://tap/languages/ADQL-2.0">
<label>ADQL 2.0</label>
<language ivo-id="ivo://tap/languages/ADQL-2.0">
<label>ADQL 2.0</label>
The languages supported by the service. ivo-id here probably should be
optional so people can write stuff like "TurboSQL 23.3". In addition,
we give a "parameter" element, which is the value actually passed to the
service (in this case, in LANG), and again a "label" element intended to
be shown to humans in UIs.
We should probably allow for "description" as well. It's not here since
my software doesn't have that for languages and friends.
<label>VOTable, binary</label>
<label>VOTable, tabledata</label>
<label>CSV without column labels</label>
<label>VOTable, binary</label>
<label>FITS binary table</label>
<label>CSV with column labels</label>
<label>Tab separated values</label>
<label>HTML table</label>
<label>FITS binary table</label>
<label>HTML table</label>
<label>Tab separated values</label>
<label>CSV with column labels</label>
<label>VOTable, binary</label>
Output formats again have parameter, label and description as languages
do. In addition, we give the mime type that will result when the parameter
value is put in. Note that the preservation of VOTable mime type is reflected
<uploadMethod ivo-id="ivo://tap/uploadmethods/inline">
<label>POST inline upload</label>
<label>http URL</label>
<label>https URL</label>
<label>ftp URL</label>
Upload methods. It would be nice if those could have ivo-ids as well,
but giving ivo-ids to http (or, God forbid, ftp) seems wrong. Should
we just agree on a controlled vocabulary, starting with inline, http,
https, ftp?
Then there's the whole VOSpace business. I'd be grateful if someone
who actually did some real stuff with vos could come up with a proposal
of how to represent that. If we could get by just saying "vos1",
"vos1.2", "vos2.0" or somesuch I think that would be highly preferable.
Pat, on the other hand, has said:
Of course, for a vos URI, there is a whole other level of transfer protocol metadata: a service says it supports "vos" and knows how to talk to a vospace (well, that one would have to be versioned), but that does not mean it can get a file from a vospace that only uses SRB for transport. I don't think we can solve that here.
The extra thing we discussed was what kind of authentication the TAP service could do on the users behalf (in order to get an input table from a URI), but I think that is already covered by the service having a registered and associated CDP service. This would potentially come up with vos and https schemes since that could require an X.509 certficate to authenticate and that is the exact case that is covered by the TAP and associated CDP service: the user knows if a certifciate is needed and the TAP service can declare that is has this associated CDP service where the user can (in advance) store a proxy certficate. So, in my opinion we do not have to be able to specify what authentication the TAP service knows how to perform here.
Resource limits. retentionPeriod, executionDuration are given in
seconds (it's the SI unit, after all). There's a default limit and
possibly a hard limit. Both are optional. A missing limit says
"we didn't bother figuring it out" or "no enforced limit".
Pat said
In our service, we try to estimate the result size in total by looking at the selected columns (and knowing the output size of each column) and then dynamically limit MAXREC: fewer columns -> more rows allowed. So this is a limit in megabytes, not rows. Also, the limit is different in different scenarios: sync queries have no default and no limit on MAXREC, async queries currently have a dynamic limit, and once we support output to VOSpace, async queries sending to VOSpace will also have no default or limit. [...] So, values for the attributes could be an integer, "none", or "dynamic".
I don't like this -- not only will it uglyfy the schema, it'll also make the
client's life a lot harder when it actually tries to use this information. I'd
rather suggest that people with dynamic limits are encouraged to put in some
conservative estimate, and probably ignore limits on sync queries for this
<signature>gavo_match(pattern TEXT, string TEXT) -> INTEGER</signature>
<description>The function returns 1 if the posix regular expression pattern matches
anything in string, 0 otherwise.</description>
Finally, the user defined functions. I think those are a must so users
have some standard way of figuring them out; on the other hand, I think
machines need not be too concerned about them. Therefore, in addition
to the name, there's just a human-readable description (user agents are
encouraged to reproduce them verbatim, i.e., preserving whitespace and
such) and a signature. The signature should be machine-parseable to
accommodate use-cases in which this might be useful. The schema, of
course, does not need to enforce this.
In the signature, only regular identifiers are allowed, no quoted
identifiers. This is implied by the grammar for the name and the
type names, so it only needs to be stipulated for the parameter names.
Open issues:
- upload limits -- I realize it's a good idea to be able to express those. But how? I don't currently enforce any, but if I did, I'd probably enforce them by bytes. On the other hand limits on the number of rows clearly would make more sense...
- quoteMethod -- I suspect that few people currently bother to come up with a good quote. If there's ever some kind of scheduling service, being able to communicate where the quote comes from would be useful. Maybe one should have a controlled vocabulary here ("plan", "queuelength", "sample", "thin air")?
- standard inputParams for the protocol parameters (LANG, FORMAT, REQUEST, RUNID, etc)? I'm lazy, so I'd rather not include them, and they should more or less be the tham for all services. I don't feel strongly about them, though, and declaring them might be useful for optional input parameters. Certainly, additional ("PQL") parameters should be declared, but VODataService already says how to do that.
About creating a VOResource extension
VOResource spec has some specific recommendations about how to create an
extension schema; however, a
"how-to" create an extension presentation gives an introduction to the process.
In summary, the
RWG recommends the following steps for defining a new extension:
- Name and define the concepts to be captured
- Create a prototype VOResource instance
- Create the Schema Extension
- Describe the extension in an IVOA document (preferably as a section of a protocol document).
Step 1: Concepts to Include
The following concepts should be captured within TAP capabilities (much of it mased on grepping the UWS and TAP specs for "may" and "should"):
- List of data models exposed -- as URIs, e.g., the ObsCore model:
- List of query languages supported -- these should be well-known strings as used in LANG, e.g. ADQL, ADQL-2.0, etc. They should contain a human-readable description (as element content?). We should recommend a convention for SQL in the spirit of "SQL-Postgres", "SQL-MySQL", etc.
- List of output formats -- specified with required MIME and optional shorthand. Again, a human-readable description (as element content?) would be nice.
The Upload Problem and VOSpace
From Pat's summary of the Nara discussion:
Controlled vocabulary for well know protocols - I would
suggest the protocol scheme in lower case as that is common usage, ivo URI for
protocols described in the registry - eg vos.
For vos URI support, we also need to specify if the service can perform
authentication, but that is already specified when a service specifies the
endpoint for the associated CDP service which would be required, so in my
opinion one can just say they support "vos" (via the URI) and that means
unauthenticated; if the service also has a supporting CDP then they can do
authenticated (CDP spec says explicitly how to do this - maybe we should at
least explicitly refer to the CDP spec section)
Things we'd probably not want in the capability
- Extended capabilities -- if they exist, create another capability element
- format of table names: name vs. schema.name vs. cat.schema.name -- since table names are delivered in qualified form, this is irrelevant for clients
- VOSI support -- this can be inferred from elsewhere in the registry record
- Passing on the RUNID -- do people need to know this from the registry?
- Further tables in TAP_SCHEMA -- can be taken from elsewhere in the registry record
Things deferred at Nara
- List of settable parameters (probably open-ended as key-value pairs; for limits and such, absence would mean "unlimited", max==default would mean "changing not supported"):
- Server settings
- default/maximum retention period (=destruction time-creation time)
- default/maximum run time
- default/maximum row limit
- uploadRowLimit uploadByteLimit
- maybe quoteMethod -- how does the service come up with a quote: never, always artificial value, based on a query plan, based on the length of an input queue,...
- List of user defined functions -- with name, arguments (name, type, description), return type, and a short, human-readable documentation (does plain text suffice?)