Table Access Discussions
Meeting to discuss the Table Access Protocol (TAP)
JHU, Nov 19-20 2007
Present in Baltimore: John Good, Bob Hanisch, Keith Noddle, Francois
Ochsenbein, Alex Szalay, Doug Tody. On the phone: Matthew Graham,
Pedro Osuna, Ray Plante. The goal of the meeting was to make progress
on the TAP design and implementation strategy, and in particular
to reach agreement on the key issues which have come up in earlier
discussions. A collection of related documents written over the past
year (see below) was distributed in advance of the meeting. Most of
the discussion was organized around the issues and draft interface
covered in
this presentation, which
also serves as the background for the agreements summarized here.
This page summarizes the agreements reached in the course of this
meeting. All present jointly produced, reviewed, and agreed to the
decisions documented on this page.
This page is derived from
the meeting TWiki page used to record agreements reached during the meeting.
Preliminary Agreements
- ADQL Query
- Agree need full-function ADQL based query
- Includes asynchronous execution, VOSpace integration for data staging, SSO, etc.
- General interface is a POST, can do async, specify disposition, etc.
- A simple synchronous GET version should also be provided
- May need way to have sync version time out, point to async job
-
- Issues
- Timeout
- Maximum number of rows
- Authentication vs resource
- Simple Query
- Agree that we need a Simple Query capability in TAP
- Simple query should support both data queries and metadata queries within a single operation, and hence provide uniform access to both table data and metadata.
- SimpleQuery will include some sort of region specification (e.g., POS,SIZE, perhaps something else TBD), and will eventually replace the legacy Cone Search, which will be deprecated once TAP 1.0 becomes a Recommendation.
- Interface should be parameter based
- SELECT, FROM, WHERE (FORMAT etc.)
- Region probably limited to cone search region initially; mechanism should be a string parameter with extensible syntax to permit generalization to more complex regions without changing the interface.
- Basic interface is GET, synchronous
- However, implementation may (if enabled by the client) synchronously return a status indication which on the back end refers to an async job and data staging.
-
- Issues
- Extended region specification (compatibility with ADQL REGION, DAL POS is desired, needs further thought)
- Table Metadata Queries
- Agree to provide table metadata in two ways:
- Via a minimal core TAP schema composed as two separate tables (TAP_SCHEMA.tables and TAP_SCHEMA.columns). This mechanism can be easily extended to add additional schema tables, or additional metadata fields to a given table.
- Via a tableset query, part of the metadata query mechanism, which returns basic metadata for all tables available via a service, either in the form of a VOTable containing only table metadata, or in a registry compliant XML format (this can be easily constructed from the more basic table/column metadata).
- The minimal required metadata query mechanism need not allow general queries against the TAP schema; therefore the minimal implementation returns a list of all tables, or all the columns of a specified table, or all the columns of all tables (i.e., the full TAP_SCHEMA.columns table). An implication of this is that simple implementations do not need to implement the table metadata as actual tables, even though the query interface is table-oriented. More advanced queries are permitted but would be an optional advanced capability.
- Although it is not required for a minimal TAP metadata query capability, a full function SimpleQuery may allow use of the WHERE clause to refine metadata, for example to return metadata for a subset of tables (e.g., a "WHERE tableName=vospace.*" type clause (syntax TBD) could be used to query only for the tables in the local vospace schema).
- TAP should also define what is a null data query, and what a TAP service should return in this case. The response should be an empty table containing only column metadata, hence such a null data query could provide another way to get table metadata (although this is not intended as the primary table metadata query mechanism). A good way to force a null data query might be to add a MAXREC=0 parameter to the query.
-
- Issues
- The proposed core TAP schema is a slightly modified version of what is in a VODataService (registry) schema currently. Future versions may add additional metadata following prototyping and discussion (for example a GroupID field to group table fields as for VOTable, or additional metadata required to support advanced queries).
- We should consider what additional metadata which should be added to the core schema. Possibilities discussed included adding UCD/UTYPE to TAP_SCHEMA.tables, to provide a more formal indication of table type than provided by "description"; a "PRIMARY=true/false" attribute, to indicate if a given table column should be displayed in a "narrow" view of the table, and other features such as the default column output format (width, precision) for display, or whether a given column can contain nulls.
- How to define standard views of tables (in the SQL sense) was discussed. A simple method (see previous bullet) is to flag each column as "primary" or not, meaning that the column should be displayed in a "narrow" view of a wide table. TAP should also define a standard way to associate provider-defined standard views of a table, e.g., by a standard naming convention, combined with standard table metadata (type, description, utype) to associate base tables and views in a standard way (this would be an optional advanced feature but could be very useful and is simple to provide given a DBMS back-end).
- In TAP (as in SQL) a view is a type of table, and provider defined views can be accessed via any TAP interface including SimpleQuery. Table metadata queries will list related base tables and views in TAP_SCHEMA.tables, and will list the columns of a view in TAP_SCHEMA.columns.
- Interface Consistency
- DAL should provide (for the second generation interfaces beginning with SSA) a consistent family of data access interfaces for the generic dataset, for a catalog, for an image or spectrum, etc.; TAP should be consistent with the other DAL interfaces (and vice versa) in terms of concept, form, and semantics, except where there is good reason to do something differently for table access.
- Minimal TAP Service
- Implements the SimpleQuery and getCapabilities operations (and probably getAvailability as well if required by VOSI). AdqlQuery is an optional advanced operation which is required for a fully compliant TAP service, but not for a minimally compliant TAP service.
- The SimpleQuery operation implements both table data queries and basic table metadata queries via a uniform parameter-based (non-ADQL) query interface. The exact set of parameters for the SimpleQuery operation are TBD, as is the minimum functionality required for queries.
- The minimum requirements for table metadata queries include support for accessing the TAP_SCHEMA.tables and TAP_SCHEMA.columns tables in SimpleQuery, as well as support for a null table data query (which returns VOTable header metadata if VOTable is the output format), as well as support for a "tableset" query with output in either VOTable or registry compliant XML format. The TAP_SCHEMA tables may or may not be implemented as actual tables by the TAP service (in the minimal case this is not required). By "tableset metadata" we mean a data structure (VOTable or XML) containing table and column metadata for all tables available from the service. The tableset metadata outputs are redundant given TAP_SCHEMA, but are easily constructed from this metadata and are desirable for some implementations, hence it was agreed to provide both.
Implementation Strategy
- Keith takes an action as DAL Chair to inform the Exec of our decision to add a SimpleQuery operation to TAP, to provide a minimal TAP implementation for smaller data providers, as well as provide a usable TAP service specification for use while we prototype and specify the full TAP with AdqlQuery and Grid capabilities.
- Draft TAP V0.1 specification defining the basic form of the interface, including all planned service operations, with a usable first version of the SimpleQuery operation, including both data and metadata query functionality, and the core TAP schema. (Target mid-March 2008 for specification, March-May for prototypes)
- Some preliminary prototyping of UWS and TAP integration by the time of the May interop (will provide feedback to GWS to help finalize UWS 1.0).
- Draft V0.2 TAP specification for discussion at May interop.
- First working prototypes of the TAP AdqlQuery including VOSpace integration and UWS (possibly also SSO). (early summer 2008)
- Complete first full working draft TAP 1.0 spec (fall 2008)
- As a separate matter we will prepare an IVOA Note describing the basic concept, form, and semantics of the second generation DAL service profile.
VOSpace Integration
The issue of how to integrate VOSpace with TAP was discussed at the
end of the meeting; these notes were added following the meeting to
summarize these discussions.
- VOSpace will provide the primary mechanism used to manage and transport user tables, be these uploaded by the user, or output from a user query. This will free the TAP service from having to provide capabilities for data transport, and will provide a robust, fail-safe mechanism for transporting large datasets.
- It will also be possible to upload a table or tables directly in a POST-type query. This will provide a more convenient mechanism for cases such as uploading a source list for a cross-match, but VOSpace will be the preferred mechanism for larger operations. Tables uploaded in this manner are transient and will be discarded after a query executes, unlike VOSpace-resident tables which are persistent.
- The proposed mechanism for accessing such user-provided tables in TAP queries is to represent the VOSpace or query-upload storage space as a DBMS schema (or possibly catalog). Hence a table in the local VOSpace might be referenced in the query as "vospace._tableName_" and a table uploaded as part of the query might be referenced as "upload._tableName_" (exact syntax TBD).
- A full-up TAP implementation will require a VOSpace implementation which is locally resident with the service. The TAP service will interact only with this local VOSpace; all interaction with remote VOSpace services will be handled separately, e.g., by the client application or workflow.
- Exactly what "local VOSpace" means is not yet clear, and probably can be left up to the local implementation. One approach would be to implement at least part of the VOSpace within the DBMS itself, so that user tables can be referenced directly by TAP queries, using the schema-based syntax described above. In this case both a bit-for-bit copy of the VOTable uploaded by the user might be stored, as well as a version of the table preloaded into a DBMS table for use in queries. TAP metadata queries would show the contents of the VOSpace as well as any archive data tables. Another implementation might use a separate file-oriented VOSpace, with tables being dynamically loaded into the DBMS for use in queries.
- VOSpace may also be used to store files other than tables, for example images or spectra to be accessed by an SIA or SSA service (the same local VOSpace implementation might be used for any type of data).
- A physical VOSpace implementation might store data for multiple users, but a given user would see only data for which they are granted access, i.e., their user tables, or public data.
- Since a TAP service would interact only with the local VOSpace, a TAP query would require only the (possibly fully qualfied) table name to reference an input, output, or VOSpace table. If the VOSpace supports file paths it is TBD how this is represented within TAP.
- Although it is intended that VOSpace will provide the general mechanism for file transport, it might be desirable for the TAP interface to be able to return an access reference to the client application to allow easy retrieval of output tables without requiring direct client interaction with VOSpace (synchronous queries also avoid the need to interact with VOSpace).
- Table names are normally supplied by the user or client application, but might be auto-generated in some cases, for example when a synchronous query is promoted to asynchronous.
TAP Documents
The following documents (most of which were written by others) have
been collected here to provide materials to review in preparation
for the TAP discussions at JHU Nov 19-20.
--
DougTody - 16 Nov 2007