DataLink 1.1 Proposed Recommendation: Request for Comments
Introduction
DataLink describes the linking of data discovery metadata to access to the data itself, further detailed metadata, related resources, and to services that perform operations on the data.
The main changes in v1.1 are
- Generalize by adding use cases for links to content other than data files
- VOSI-availability and VOSI-capabilities endpoints are now optional
- Service descriptors can include exampleURL and contentType param(s), as well as DESCRIPTION, name, etc...
- Added optional link_auth and link_authorized to signal whether authentication is necessary to use the link
- INFO element with standardID mandatory in {links} response
- Added content_qualifier FIELD to inform on the nature of the link target
- Added local_semantics to identify similar links in the same DataLink service for different IDs
- Mechanisms to recognize {links} endpoints outside ObsCore
Latest version of
DataLink can be found at:
The
GitHub repository for issues and source can be found at:
Detailed discussion towards 1-1 can also be found on this ivoa twiki page ( last update by
FrancoisBonnarel - 2023-05-04):
Reference Interoperable Implementations
Server side
GAVO implemented the following changes = content_qualifier and local_semantics, service descriptor additional features such as DESCRIPTION? name, content_type, etc....
As a matter of example, a couple of links response for various obscore/sia/ssa tables in GAVO server
All these are combined with various semantics or content_type values
--
MarkusDemleitner - 2023-07-10
CADC has implemented the following in
https://ws.cadc-ccda.hia-iha.nrc-cnrc.gc.ca/caom2ops:
- INFO element with standardID in links response
- new optional fields in links response: local_semantics (no content yet but can be populated with default vocab in most cases), content_qualifier (no content, not likely to use), link_auth, link_authorized
- contentType param in service descriptors (where applicable)
IRIS image:
https://ws.cadc-ccda.hia-iha.nrc-cnrc.gc.ca/caom2ops/datalink?ID=ivo://cadc.nrc.ca/IRIS?f212h000/IRAS-25um shows link_auth=optional and link_authorized=true because one can authenticate but the data is public.
new CFHT data: anonymous use of
https://ws.cadc-ccda.hia-iha.nrc-cnrc.gc.ca/caom2ops/datalink?ID=ivo://cadc.nrc.ca/CFHT?2773629/2773629o shows link_auth=optional and link_authorized=false because the data is still proprietary and the caller is anonymous; if an authorized user makes the call they will see authorized=true. It's hard to demonstrate that for a general audience.
The core CADC implementation is available as a library (cadc-datalink-server) in
MavenCentral with source code at
https://github.com/opencadc/dal.git; the caom2-specific logic is available in a library (caom2-datalink-server) with source at
https://github.com/opencadc/caom2service.git -- the core lib is also used in ALMA
DataLink service but may not yet be released with the latest features.
Client side
TOPCAT v4.8-8 and later displays additional features in service descriptor and makes use of additional links table FIELD such as content_qualifier, authorization and local_semantics; The tool behavior is adapted to the content of these new FIELDS. For example Activation Actions suggested by topcat for the links not only depend on content_type but also content_qualifier, and local_semantics is used to guess which link a user is interested in based on previous selections.
AladinDesktop is going to adapt to those new features too.
(see prototype screenshot)
The CADC
DownloadManager (
https://github.com/opencadc/apps.git) includes a simple
DataLink client class so it can resolve publisherID values into 1..* URLs for download; this code hasn't changed as a result of
DataLink -1.1. The CADC
AdvancedSearch web portal makes calls to the above caom2ops/datalink service to find previews and download info for each row (publisherID): it makes use of link_authorized to decide to display the download options (or not), which prevents users from selecting downloads/links when they are not authorized and the request will be rejected later.
Implementations Validators
The following validators are available for
DataLink
- datalinklint which is part of STILTS. STILTS v3.4-9 contains DL 1.1 validation features, but later versions (at time of writing, post-3.4-9 pre-release) recommended as slightly updated for PR-DataLink-1.1-20231108.
- Show my DataLink which is part of DaCHS
Comments from the IVOA Community during RFC/TCG review period: 2023-04-21 - 2023-05-18
The comments from the TCG members during the RFC/TCG review should be included in the next section.
In order to add a comment to the document, please edit this page and add your comment to the list below in the format used for the example (include your Wiki Name so that authors can contact you for further information). When the author(s) of the document have considered the comment, they will provide a response after the comment.
Additional discussion about any of the comments or responses can be conducted on the WG mailing list. However, please be sure to enter your initial comments here for full consideration in any future revisions of this document
Community Comment by Markus Demleitner
(1) The standard ID: I'm pretty sure we discussed that before, but I'm really unsure how we came to the conclusion that even Datalink 1.1 still has the ivoid of ivo://ivoa.net/std/DataLink#links-1.0.
Yes, we have to do crazy stuff like that for the schema URIs due to the way XML element names are compared. But there is
in general no analogous need with ivoids, because we control the rules how to compare them in what situations.
Does anyone remember why we went for links-1.0 here? If not, I'd suggest links-1. I volunteer for adding a brief explanation about how clients should disregard the minor version for normal operations.
(2) I am entirely unhappy with section 3.1.1, starting with its title, which probably should be something like "Datalinks in VOTable columns". And then the first paragraph should probably say something more concrete like perhaps "Columns containing datalinks SHOULD be marked with a UCD of X.Y.Z and a LINK-typed child in its FIELD like this:
<LINK whatever="blabla"/>"
And the second paragraph I'd say doesn't belong here at all (it could go to, perhaps, 1.2.7 or a use case discussing datalinks as primary results if we think we need to be explicit about this).
There are use cases behind this. When datalink links response is hooked to table rows outside the context of ObsTAP /SIA2 how do we generate/recognize the DataLink URL ?
Of course we can use the Service Descriptor with the single ID parameter if the DataLink can be parametrized by and "id" from one of the columns. But in that case the descriptor would be doing exactly the same than the LINK element proposed here as included in the appropriate FIELD and is much less verbose. And it's pretty correct VOTable standard. The FIELD itself should not be described by a datalink ucd because it's probably generally an id.
The second paragraph refers to use cases where the URL is not built from the content of one FIELD and when the URL is ad hoc and should be the content of a FIELD. Using the same utypes than the one used in Obscore responses seems reasonable. This is for example adapted to SIA1 or SSA responses. I think this has nothing to do with recursive datalink.
We may try to rephrase all this if this is unclear, but the intent has to be kept.
-- FrancoisBonnarel - 2023-08-12
(3) In 3.2, is says:
If an error occurs while processing an ID value, there \rfcshould\ be at least one row for that ID value and an error\_message
The way the pyvo datalink client is written, we have to make that an unconditional MUST, or pyvo will keep requesting any failing ID (and frankly I'm unsure how else to implement this given multi-ID and overflows): it will only remove an ID of its list of ids to query if it gets at least one row back for it. Perhaps:
A service MUST return at least one row for each ID passed in.
[ceterum censeo we should have let ID be single-valued; it would have made everything
soo much simpler and nothing really much harder/slower]
(4) 3.2.2, second paragraph: I had to puzzle quite a bit about this, starting with wondering what a "dereferenceable URL" might be. I'd suggest to replace the entire paragraph with "Access URLs may have fragment parts, which could, for instance, refer to id-ed elements within XML documents or extensions within FITS files. As in URIs in general, the interpretation of a fragment identifier depends on the media type."
"dereferanceable" was used in the sense that it can be fully accessed by http. Which is not the case in URN in general or URL with fragments. For the latter the client is supposed to interpret the fragment. See: https://en.wikipedia.org/wiki/URI_fragment
Apart from that I agree with your rephrasing. FrancoisBonnarel - 2023-08-14"
This will also drop the "No other additional parameters or client handling are allowed." -- if this forbids query strings on access URIs, I'd strongly disagree. If this means something else, we'd have to write that something else.
In version 1.0 we could read:
"The access_url column contains a URL to download a single resource. The URL in this column must be usable as-is with no additional parameters or client handling; it can be a link to a dynamic resource (e.g. preview generation)."
This statement was not consistent with the allowance of fragments, hence the new statement. I can rephrase it in the upcoming PR. -- FrancoisBonnarel - 2023-08-14
(5) In my editoral PR, I've dropped a paragraph on semantics for error_message rows. This is now sufficiently addressed above that passage.
(6) sect. 3.2.9 content_qualifier: I think we should at least name the motivating use case a bit more precisely here, as in, perhaps: "It aids clients in presenting to the user the same sort of link as they go from one dataset to another
within a service. For instance, suppose a service serves both continuum and line cubes. Using content_qualifier, users can configure their clients such that, as they change to a new data set, they always see the line cube even when the semantics and content\_type columns agree for both types of data." Or so.
OK for this change. I will adopt it in the next PR. -- FrancoisBonnarel - 2023-08-14
(7) Sect. 4.8: Sorry, you cannot introduce a utype ("adhoc:this") in a section called "Example: X". If you are really, really sure these "self-describing" things are useful, put them into a section of their own.
Me, I've frankly never really understood where you want to go with this, and I think there's no implementation doing any of this, so perhaps we should drop the whole thing. But if we don't drop it and somehow nonchalantly mention it in an example, at least don't introduce a new utype here. What's wrong with the name="this" you had before? You see, having two different mechanisms for what to my knowledge hasn't been implemented even once seems a bit excessive.
When dropping adhoc:this, don't forget that it is referenced in sect 4.1.
The autodescription motivation may be explained earlier in the section. For "adhoc:this" I remember Pat advocating for this. If we motivate earlier then we can restrict to a pure example here. FrancoisBonnarel - 2023-08-14
(8) I have not looked at the
DataLinkImp source that's also present in the repo. If you think this ought to become a document, please extract it to a different repo; ivoatex is not designed to support two documents in one repo.
You are right. The note repo will be created in github.com/ivoa. -- FrancoisBonnarel - 2023-08-18
I've also collected a few rather editorial changes in
https://github.com/ivoa-std/DataLink/pull/108
Comments from TCG member during the RFC/TCG Review Period: 2023-04-21 - 2023-05-18
WG chairs or vice chairs must read the Document, provide comments if any (including on topics not directly linked to the Group matters) or indicate that they have no comment.
IG chairs or vice chairs are also encouraged to do the same, althought their inputs are not compulsory.
TCG Chair & Vice Chair
With positive review by the TCG with a comments & feedback period successfully completed, the TCG chair/Vice Chair approve as well.
No comment on the document, we appreciate the presence of examples that clarify the usage and implementation
--
Datalink is used and usage will increase for external webservice like simulated data, output format that are not in IVOA (Hapi Timeseries, OGC format ...)
May be change the datalink page with examples of implementation
refer to the datalink page in the document.
encourage working/interest groups to put examples as Markus did
Some minor edits only, otherwise this update looks sound.
- Section references would be useful in the v1.1 changes list - PR#105 raised and merged
- Some minor grammar updates - PR #107 raised
--
JamesDempsey - 2023-07-03
I have a rather rudimentary understanding of
DataLink, VOSI and
DALI, so there are some details that I'm glossing over in my read.
I don't see any real issues/conflicts with the DM group work. However, I have 2 points/questions to raise:
- local_semantics: This is an identifier from a local vocabulary to help identify/select rows at a finer level than possible with just the other tags (semantics, content_type, content_qualifier). I'm guessing this is for something like ObsCore 's dataproduct_subtype. My question is that I don't really understand what the value is... is it just the tag? or URI for the local vocabulary + tag? The example serializations are no help since the DaCHs ones seem to resolve into a pretty format and I can't see the actual datalink content, and the CADC examples don't populate this field. I'd like to see, either in the document or examples, something more concrete.
I don't think authors discussed this point too much. IMHO both status would be acceptable. simple terms are enough to associate the results, but local vocab URI + tag allow to link the term to definitions and relationships, so it's reacher. Examples wil be given in the text. FrancoisBonnarel - 2023-08-18
- Product Type vocabulary: This directly affects the DM group, it'd be used in the Dataset model and ObsCore could be updated to use it as well. The link in the standard resolves to a 2021 version of the vocabulary. At the interop, a 2023 version was discussed which looked like it had some issues. Which vocabulary would support this REC?
- 2023-07-28: The referenced vocabulary now resolves to a version dated 2023-06-26 (though the event-list discussion was just going on this week). The elements and definitions in this list appear compatible with DM group usage in the ObsCore and Dataset models. -- MarkCresitelloDittmar - 2023-07-28
Yes, the product-type ivoa vocabulary is what should be used from now onwards in dataset DM, next version of ObsCore as well as DataLink content_qualifier or maybe also registry standards. -- FrancoisBonnarel - 2023-08-18
--
MarkCresitelloDittmar - 2023-06-21
Followup on revised document:
I see the items above have been addressed satisfactorily, I see no additional issues with the revised document.
--
MarkCresitelloDittmar - 2023-11-11
Possible backward compatibility drawbacks in VOSpace (VOSpace implementation can use a
DataLink to reference data location):
No particular remark pertaining to Registry standards.
No issue for Semantics at this point.
I don't strictly speaking speak for Operations IG as of this week, but since I did most of the review before my term expired, I'll fill it in here; the TCG can decide whether this counts as an Ops endorsement or not.
As one of the authors I'm basically happy with this document, but I will draw attention to one or two issues.
- Section 2.2 defines the standardID for this standard as ivo://ivoa.net/std/DataLink#links-1.0, followed by the comment "Note this is applicable to endpoints following any version 1.* of the DataLink standard, to avoid backward compatibility problems." In my opinion the backward compatibility problems are not sufficient to justify this choice, and the minor version should be reflected in this standard ID, i.e. it should be "...#links-1.1". This has been discussed in the open github Issue #96, and other authors seem to agree. A fix will require at least an update to the StandardsRegExt record, and also changes in the document to places where the key is referenced, including Section 2.2 and, especially, Section 3.3.1 as well as related example text. This change would amongst other things make it possible for validators to check which minor version they are supposed to be validating against. PatDowler has volunteered to write a Pull Request addressing this issue, but I can have a go if he doesn't.
- Section 3.3.1 REQUIRES an INFO defining a suitable standardID for links response tables. The example shows such an INFO element as a child of the RESOURCE/@type="results" element, but it's not clear what restrictions there are on the location - does it have to go there, or can it be elsewhere in the VOTable? This should be clarified. If it's not required to be a child of the results resource, the example text in this section should probably be cut down.
This is done in consistency with DALI. DALI seems to insist that INFO elements should be in the primary RESOURCE (name="results"), and that other RESOURCEs may be in the VOTable. We may be more explicit on this. see next PR -- FrancoisBonnarel - 2023-08-18
- Section 3.2.2: The final sentence says "No other additional parameters or client handling are allowed." I don't understand what is meant here. Should this sentence be removed?
See my answer to Markus above. And next PR. -- FrancoisBonnarel - 2023-08-18
- As mentioned in Issue #82, the recommended MIME type application/x-votable+xml;content=datalink is using a content-type parameter for VOTable not endorsed in the VOTable standard, which is a bit questionable; but this is not new in this version of DataLink and it will hopefully be addressed in VOTable 1.5, so turning a blind eye is probably OK.
--
MarkTaylor - 2023-05-15
Nothing to add.
--
Anne Raugh - 2023-11-10
Just one suggestion, perhaps it would be a good idea to explain the notation with braces (e.g. {link}) in the "Conformance-related definitions" section (page 3).
PierreFernique - 2024-11-08
I have added this text to that section (in PR, will nerge before REC):
"This document uses curly braces (e.g. \{name\} to refer to a named concept
such as a web servcie endpoint where the text requires a logical name but
the actual name in a service implementing the standard are not restricted."
--
PatrickDowler 2023-11-11
TCG Vote : 2023-05-19 - 2023-06-01
If you have minor comments (typos) on the last version of the document please indicate it in the Comments column of the table and post them in the TCG comments section above with the date.
Group |
Yes |
No |
Abstain |
Comments |
TCG |
* |
|
|
|
Apps |
* |
|
|
|
DAL |
* |
|
|
|
DM |
* |
|
|
|
GWS |
* |
|
|
|
Registry |
* |
|
|
|
Semantics |
* |
|
|
|
DCP |
* |
|
|
|
Edu |
|
|
|
|
KDIG |
* |
|
|
|
Ops |
* |
|
|
|
Radio |
|
|
|
|
SSIG |
* |
|
|
|
Theory |
|
|
|
|
TD |
* |
|
|
|
<nop>StdProc |
|
|
|
|
<!--
* Set ALLOWTOPICRENAME =
TWikiAdminGroup -->