Semantics Session at the May 2020 Virtual Interop
Wed, May 6, 15:00-16:00 UTC.
In this session, we will mainly discuss the latest Working Draft for Vocabularies in the VO 2,
http://ivoa.net/documents/Vocabularies/20200326/. Markus will give a brief introduction in order to help newcomers to this endeavour get up to speed.
Informal Minutes
These minutes reflect the etherpad that accompanied the session with contributions from many participants.
MarkusDemleitner did some (hopefully only editorial) smoothing and re-shuffling.
Opening slides
See
https://wiki.ivoa.net/internal/IVOA/InterOpMay2020Semantics/slides.pdf
To get a few more ideas of what is talked about here, see
Desise discussion
Desise = dead simple semantics
The idea is not to expose RDF to casual users, not to replace RDF; desise files are generated from the RDF (which in turn is generated from various inputs by VO tooling).
For instance, Datalink in Turtle:
http://ivoa.net/rdf/datalink/core/2020-02-12/datalink.rdf; in Desise:
http://ivoa.net/rdf/datalink/core/2020-02-12/datalink.desise. Operationally, clients select the format by HTTP content negotiation; clients ask for text/json;content=desise to get desise (not just json, because we may support fiducial RDF in JSON later).
How bad is it to introduce a custom, non-RDF format?
Sarah would like to understand why existing RDF JSON serializations are not appropriate.
Markus found that when implementing revovo it involved a lot of lines of code to translate RDF (regardless of whether it's RDF-XML, turtle, or anything else) into a data structure making the concrete use cases straightforward.
[revovo,
https://volute.g-vo.org/svn/trunk/projects/semantics/Vocabularies/revovo.py at least up to rev. 5782, was a ~500 lines python module to turn our RDF-XML into something that let people actually perform our use cases]
Katie was interested in seeing why it took so many lines of code. Using rdflib in python she didn't find it difficult to parse/navigate the UAT RDF file.
Markus (after-session): The purpose of desise is to save people having to use rdflib or its counterparts in other languages; when distributing software, each extra dependency causes extra pain (not to mention that other languages may not have an equivalent to rdflib). Consider desise a VO-specific API to RDF that happens to work as data so anything with a json parser can use it.
Last Desise documentation: sect. 3 of the current Voc 2 WD,
http://ivoa.net/documents/Vocabularies/20200326/WD-Vocabularies-2.0-20200326.html#tth_sEc3 (note, though, that the next WD will probably make the values in the terms dictionaries dicts as well).
It was also asked whether JSON really is better than YAML -- or the favourite {whatever} of the month. Markus: Well, as far as I can tell, json bindings in most languages are about as core as they can get. All other "serialise-complex-things-into-a-string" methods are not nearly as widespread. And since JSON does the job, it seems an obvious choice.
Lessons Learned from the VEP process
New:
VEPs are now linked with Supercedes.
New: We now require an actual usage example.
Do we need some time limit to push things when the discussion fizzles out? -- there was a bit of discussion here; what is clear is that a debate may take a lot of time, as in the case of VEP-001. Markus isn't worried about it as long as there's activity. Things become worrisome if activity ceases because some questing lingers unanswered.
We did not identify any concrete rules we ought to adopt to tell if a VEP has gone stale at this point. Let's gather a bit more experience.
Does the current plan fit peoples' use cases?
Carlo: We should ask to people. Personally, I do not see the need for introducing yet another serialisation format, but I am maybe wrong.
Pat's experience using vocabularies in CAOM is that developers look at the doc (html) and write code; no one ever asked for a machine-readable list of terms for validation. Markus says that's exactly what he'd like to change: vocabularies, both as word lists and as hierarchies, ought to evolve into declarative, ideally pluggable, source code (see the use cases in the doc).
Mark is using the machine-readable terms from the vocabularies at www.ivoa.net/rdf in stilts votable validator. Would an ascii list of words (nothing else) suffice for him? Currently yes, as that's all he is pulling out of the vocab endpoints; desise would probably be useful when a client wants words and descriptions, but conveying the hierarchy seems like a stretch goal to him (another level of complexity).
Sarah definitely prefers a tree format to plain ASCII. She just wonders if we're creating more work for ourselves to have to parse out a custom tree format like desise.
The UAT and endorsed vocabularies
More details on the UAT here:
http://astrothesaurus.org/,
https://github.com/astrothesaurus/UAT
Informationally, experimental UAT in Desise:
http://ivoa.net/rdf/uat/2020-04-30/uat.desise, with other serialisations at the bottom of
http://www.ivoa.net/rdf/uat/.
On IDs and URIs: The IVOA version of the UAT makes new concepts with new IDs, for instance "stellar-classification" vs
http://astrothesaurus.org/uat/1589. The IDs in desise aren't (written as) URIs any more to save people parsing them. This design was taken because our current use cases don't span vocabularies, which again is something we'd like to keep because we think being able to work offline is important.
Full concept/resource URIs can be built from desise using the vocabulary URI (which is declared at the top level) and the keys of the terms dictionary.
Are there more external vocabularies that we may want to adopt?
Dublin core is a good candidate, also used in some provenance artifacts. But is there a (use) case for having it ivoa-ised (as with the UAT's use in VOResource)?
Fostering Takeup
Implementations Markus would like to see before going for RFC:
- A validator (the RofR validator would use it; VOTable TIMESYS)
- A datalink client ("#calibration and more special"); Markus will probably try and charm the pyvo folks to add something like that to their datalink implementation.
- Perhaps an expand_query(voc, term) ADQL UDF. There was a bit of speculation if the morphing engine shouldn't know about the voabualaries in different columns to save users having to specify the vocabulary to use. For instance, obscore dataproduct_type would automatically inspect the product_type vocabulary, whereas rr.res_subject.res_subject would use the UAT. Seems rather fragile, however, and nobody wants to figure out how to interoperably communicate the base vocabulary.
Licensing
Our vocabulary files ought to be re-distributable with software without encumbering it. Markus therefore proposes we add some language that anything that goes into IVOA vocabularies is CC-0 automatically.
Nobody disagrees with that.
Carlo: This is not only about exposing the terms, but also about using them. Should we add a disclaimer if necessary?
Pat: words, language, ... has to be public domain so I agree with CC-0
Vocabulary Work
Do
VEPs!
Markus would really like to see a vocabulary of object types, SIMBAD-compatible; they are used in several places, including SSAP, which currently references Simbad, which is not an ideal situation for a standard.
CDS/Anais and Sebastien are currently reorganising the Object types dictionary and check with UAT. There clearly is a problem of granularity when comparing categories in these two vocabularies.
Katie would be interested in working on that with the CDS folks.
Laurent asks about vocabularies within the source datamodel, which vocabulary/ies to pick up? Several lists may fit, with more or less precision? After session, Markus suspects the precision thing could be another excellent use case for why we have to offer simple access to term hierarchies.
Ada: for lightcurves annotation there is a need for terms to express magnitude system, e.g., VEGA system, AB, in a controlled vocabulary. These concepts at least don't exist in the UAT yet. Katie sees there could be a space for these. The UAT has the concept of Photometric systems (
http://astrothesaurus.org/uat/1233), which seems equivalent, but the more specific terms (e.g., broad band photometry, color equation) go somewhere completely different (which is fair by SKOS, but precludes immediate re-use for what we might need). Katie and Ada will follow up.
Ada mentions the filter profile service at
http://svo2.cab.inta-csic.es/theory/fps/; it's an implementation of the
PhotDMv1-1, which describes some possible values for the systems.
She doesn't think that spec lists all the posible photometric systems, and she is not sure if we should try to make a comprehensive list here, but it's worth a thought for sure.
Vocabularies in other communities
RDA is producing a reccomendation around FAIR semantics:
https://doi.org/10.5281/zenodo.3707985 Is what we are doing here compliant with this? Should we care about this compliance?
Carlo: Semantic artefacts are defined in RDA as vocabularies that can be interlinked suggests to check how we differ from and match with their strategy.
Markus has skimmed the document and only found the license issue (cf. above) as a glaring omission on our side, which we are about to fix.
Stil, everyone involved in the Semantic WG, please help to check what we may have missed.