SimDAL 1.0 Proposed Recommendation: Request for CommentsPublic discussion page for the IVOA SimDAL 1.0 Proposed Recommendation. The latest version of the SimDAL Specification can be found at:Reference Interoperable ImplementationsComments from the IVOA Community and TCG members during RFC period: 2016-07-08 - 2016-08-22Comments from Enrique Solano
Anwser: This has been clarified. The text now mentions the components can be found in SimDAL repositories and registries
Comments from Mark TaylorI don't have a strong interest in SimDAL, and I have not thoroughly reviewed this draft, but I read it and have some comments.
We would need to have the properties as table columns in a table in a relational database, which is simply not possible for the majority of the rdbms currently in use (which we would have to use if we would like to use TAP, since TAP is strongly SQL, and so relational, coupled). Storing such data in TAP-way in RDBMS would require to have the properties as table columns in a table but it is not possible to manage high dimension data (i.e. table columns) for the majority of the RDMS currently in use (Postgress, MySQL). High dimension data and their use is much more properly served by other type of storage architectures. That publishers cannot (or would have great difficulty - i.e nonsense - to) use with TAP when they do not have SQL compatibility/adapter. Note that if the definition of SimDAL has been so long, that is because many technological solutions have been tested (and implemented) before reaching the present proposition. Among them TAP has been tested on various data management systems / storage architecture. The conclusion of this implementation is that TAP is not an option. The views solution adopted in SimDAL has two benefits 1 - it decouples the standard VO interface of the technology to store the data (so a publisher can choose the technology he preferes depending on the particularities of his data) 2 - it is as similar as possible to TAP (virtual table + view schema) so that publishers already familiar with the VO should not be lost. Concerning the SimDAL Repository part: First, note that SimDAL components (and among them the SimDAL Repositories) are registered in the IVAO registries. To the difference of the registries, SimDAL Repositories describe resources (protocols /codes, projects, etc.) with the semantics defined in the Simulation DataModel So it is only with SimDAL Repositories that a search for resources can be done using the SimDM semantics. Moreover, SimDAL Repositories are places where the SimDM XML serializations of projects and protocols (codes) are stored. These serializations are the descriptions of theoretical projects and codes that are published in the VO. IVOA registries do not have functionalities to store and query such serializations whereas SimDAL Repositories do. Discussions with Markus (for the Registry W.G.) showed that some parts of these serializations could be transformed and ingested in the IVOA registries. Nevertheless, this would be done loosing the relationships between SimDM classes, and so loosing the hierarchy of the model and a part of the SimDM semantics. Presently, the SimDAL Repository search API does not allow to fully benefit of the SimDM XML serializations despite most scientific use cases would require fine grain search in these SimDM serializations to discover efficiently protocols and projects of interest. This has been a choice for the version 1.0 of SimDAL. Indeed, in the coming months / years we do not expect to have a lot of registered IVOA theory services and so, it should be easy for users to discover theoretical services with the SimDAL Repositories as presently defined. Nevertheless, when more and more theoretical services will be registered finer grain search will be necessary. SimDAL Repositories as defined in version 1.0, storing the full XML serializations of projects and protocols, contain all the informations and the standardized relationships between these informations to answer these use cases. It will then be time to extend the capabilities of its Search API.
A few comments on its use: 1 - To search for simulation, follow the order in the top menu: Search in the Repository, then do a SimDAL Search, and finally search in Access data. Each step provide the URIs for the next one. 2 - In the repository search, first select a SimDAL Repository before doing a {search} or ask for the list of {projects}. 3 - At each step, after a search, the system provides the URI of the services. These URIs have to be copy-paste in the next step. -- MarkTaylor - 2016-07-14 Answers: --IVOA.FranckLePetit and DavidLanguignon - 2016-08-09 Comments from Markus Demleitner(This is still against the 2016-06-08 version; I already had the review written at the time the new draft came out. Sorry about that, but I believe most of the material is still pertinent) Let me start with the very general remark that I believe this standard tries to do too much. I think it should be three different standards at least. When reading it, I kept having the creepy feeling that far too many details are left open, more or less by necessity because there's so much to specify. You're defining more than a dozen endpoints and quite a few VOTable hacks on 50 pages; perhaps tight integration, solid SimDM foundation and specialisation on particular use cases actually let you do that, but I'm concerned that all kinds of little issues will come up when different implementations try to interoperate. Is there a client that would exercise even half of the features described in the document? In your experimental implementations, was underspecification an issue? In particular, I'm a bit concerned about the proliferation of end points. you're defining about about as many end point types as the entire rest of the VO combined. Perhaps that's ok, in particular because by and large your interfaces appear fairly "small" and tidy compared to some other things we've produced in the VO, but it's at least somewhat of a liability for writing validators, and I suspect for implementations, too. Since quite a few of the interfaces are essentially just searches in (perhaps virtual) XML documents: have you investigated whether you could reduce the number of interfaces required by re-using, say, xpath or xquery or whatever? In short, I believe you should split up this document into three pieces, each of which would work out to be more handleable. Answer: Yes we have several API endpoints. Off course we tried our best to make this number as small as possible while satisfying as many use cases as possible.
That is a big standard, that is why it was so hard and long to have a first version of it released. The other VO (DAL) standards are basically serving a single, fixed, underlying data model. As you know SimDM, that SimDAL is serving, is a meta model, i.e we are serving an infinity of underlying data models. This bring some additional use cases that we had to address by adding a complete API.
Several very smart guys addressed the SimDAL problem in the past years, and a lot of advance has been made thanks to them, yet being still not 100% perfect. The fact is a lot of people are waiting for years to have a way to publish their theoretical data and we think we have to give them a solution now. We need a basis, SimDAL v1, that will be implemented by several teams, will fulfill the most common use cases (80 %) and this on real data in production environment (not theoretical/potential use cases/issues). Then, we hope these teams will collaborate with the IVOA to provide feedbacks for a v2 version that will add some features not planned at in the first version.
As for xpath, we made SimDAL partly because we realized understanding/querying SimDM is a too high prerequisite for a scientist/developer and that we had to design an as straightforward as possible query API, even if it comes with the price of not dealing with 100% use cases for now.
We have not split SimDAL because, for the first version of the standard, we do not want to break the workflow logic that carries the user from theoretical project discovery to data retrieval, with the price of the developper having to deal with a bigger standard. Instead we tried to be clear in the document so the reader understand he does not have to implement the three parts. For next version of the standard, once several teams will be familiar with it, the splitting of the document in several more specific standards can be discussed.
Individual issues:
Comments from TCG members during the TCG Review Period: 2016-07-08 - 2016-08-22WG chairs or vice chairs must read the Document, provide comments if any and formally indicate if they approve or do not approve of the Standard. IG chairs or vice chairs are also encouraged to do the same, although their inputs are not compulsory.TCG Chair & Vice Chair ( _Matthew Graham, Pat Dowler )Applications Working Group ( _Pierre Fernique, Tom Donaldson )Approved. I have no expertise in Sim, but as an independent feature, the specification seems reasonable and consistent. -- TomDonaldson - 2016-10-22Data Access Layer Working Group ( François Bonnarel, Marco Molinaro )Data Model Working Group ( _Mark Cresitello-Dittmar, Laurent Michel ) | ||||||||
Added: | ||||||||
> > | I reviewed the 20161014 version of the document. I'm wondering if this is correct, as some of my comments appear earlier with comments that they have been addressed. Two broad comments: 1) I would like to see some statement about the completeness of the implementations.. do they implement all features of the spec? 2) In reading it, there seems to be a large overlap with existing standards ( DALI, DATALINK, etc) which was also mentioned by others. Since DALI is supposed to be the basis for DAL protocols, I'd like to see this spec. be expressed more directly in relation to it rather than statements of compatibility. In reading the responses to similar comments above, and speaking with Francois about this, I understand that this is intentional for this version. A subsequent version can integrate the spec into the DALI family. So, I will not block progress on this count. General: Section 1: pg 5 = The final lines state who should implement each of the components, but in very fuzzy terms (eg: "most of the time the SimDAL Search component"). It would be more clear if it stated specifically who would implement each part. This same fuzzy language is repeated in a couple places (pg 8) pg 6: Architecture diagram does not match the one in the master IVOA Architecture document. S2.3 - pg10: "the pivot format" as Markus stated, I have no idea what that means, perhaps it is common knowlege to the target audience? S3.0 - pg11: Question marks should be confirmed and removed. 1. "SimDAL components are exposed with APIs following a REST design (?) that conforms to the DALI resource description (?)." S3.3 - pg12: Question mark in place of reference to JSON standard S3.5 - pg15: another question mark in place of reference (VOTable) S3.6 - pg15: "must provide sufficient lifetime for interactive browsing of the pages". This is a rather vague statement if requirement. S4.1 - pg17: the description for 'q' parameter, "The search logic is up to the publisher.." How is a user supposed to know what to put in the field if the logic is up to the publisher? The example q='n(h2)' matches both 'N(H2)' and n(H2)', but only because this provider decided to make the search case insensitive? S5.6 - pg37: "the same way than the one" => "the same way as the one" S5.6 - pg37: "but in another cases it would" => "but in other cases it would" S6.3 - pg44: UWS extension.. "For various reasons but in particular because of security concerns." I don't know much about what this content, but it sounds like it is indicating security concerns with the content of the UWS document standard. Appendix B - pg48: "and some developpers" => "and some developers" Appendix B - pg48: "to have much more properties" => "to have many more properties" Appendix B - pg48: "the APIs with the willing of putting" => "with the intent" or "with the hope".. Appendix B - pg48: "should know about technics" => "the technical aspects"? Appendix B - pg48: "end user to not needing to worry" => "end user to not worrry" -- MarkCresitelloDittmar - 2016-10-26 | |||||||
Grid & Web Services Working Group ( Brian Major, Giuliano Taffoni )Registry Working Group ( _Markus Demleitner, Theresa Dower )We are somewhat concerned that there is a fairly large overlap between Registry and your repositories. It would seem that at least the plain {search} endpoint is largely covered by standard Registry infrastructure; for {projects} and {protocols} I'd say it's a matter of a Registry extension. Answer: We organized a meeting in Paris in march 2015/16 dedicated to the question of the registration of SimDAL components in the registries. Several authors of SimDAL as well as representative of the DAL WG (Marco) and Registry WG (Markus) were present. Several options and questions were identified during the meeting. After this meeting, discussions between the authors of the SimDAL standard concluded that some important use cases require SimDAL repositories: 1) fine discovery of SimDAL services requires the detailed description of theoretical services following SimDM serializations including its semantics aspects (SKOS). 2) these SimDM descriptions / serializations must be centralized (for example to have a unique description of a code to which publishers of simulations produced by this code can refer to). The need of such a way (using SimDM repositories) to discover and describe theoretical services in the VO has been identified a long time ago, before and during the definition of SimDM. Nevertheless, SimDAL components must be registered in the IVOA registries. When you say "SimDAL services may be discovered through Registry queries", I think you should say "by looking for capabilities with the standard ids defined in sect. 3.6." Answer: Yes, this precision adds clarification, thank you. We modified the document. Beyond that, if you want to define a Registry extension (and I think you should), I think you should do so in the document. Splitting up the "DAL part" and the extension, as we've done with S*AP and TAP, has proven to be a severe maintenance liability. We are happy to assist you there, and as long as you have your metadata concepts worked out, this would be a quick process. Answer: During the meeting in Paris in February 2016, we started to look at how to register SimDAL components in the IVOA registries. Two solutions appeared: 1) to map as much as possible the SimDM serializations (protocol.xml, project.xml) in the IVOA registries using extensions. This solution requires depreciating some SimDM concepts (notions of code version, owner etc …), slight modifications of registries (or take a bit of freedom as to store SKOS concepts in UCD fields), develop an XSLT transformation of SimDM serializations, etc . 2) to do a simple registration of SimDAL components in the IVOA registries without extensions using classical fields for registrations: title, description, keywords, publishers, etc ... Solution 1 is more complex to set up: it requires developments, to set up a prototype to check it works properly and to test the implications for end users. It may be nicer from an IVOA point of view but is not required by the scientific use cases defined by the Theory I.G. Nevertheless, we decided to try it but we fixed a deadline to the end of March 2016 to get an operational XSLT mapping from SimDM serializations towards registries. Without progress for the end of March we would move to solution 2 to avoid to delay a release of the SimDAL standard that is asked by several teams in the astrophysics community. Solution 1 would then be investigated for SimDAL version 2.0 Nothing happened for the end of March. So we moved towards solution 2. This solution does not need any specific descriptions since it is a standard registration of services in the IVOA registries. It fulfills all the scientific goals. Talking about standardIds, in 3.6, it seems you are saying the curly braces should actually be part of the URI ("ivo://ivoa.net/std/SimDALSearch#{views}-1.0" in what you label "Example"). I doubt that is intended, but if it is, we would veto it; curly braces are not allowed in URIs. Answer: Ok, fixed in the document. Thank you for pointing this out. In 4.1, you say that search "should implement the pagination API" -- so, how does a client find out whether it does? As long as there is a possibility that a given service doesn't support pagination, I'd suggest you should say in 3.4 how to discover pagination support once and for all. From a Registry perspective, I'd say this is a fairly natural item for a Registry extension's metadata model. Answer: The way a client knows that a collection resource supports pagination is by looking in the resource representation (a VOTable) if there is pagination special LINK elements (that is having the special content roles described in "Pagination" section). Thus, a service may implement the pagination but decide not to use it for some collections while using it for others. This is now explained in the document. In 4.1, you define what boils down to a universal metadata model, including a means for schema discovery. We note for the record that from the Registry experience we are fairly uneasy about the usability of such an extremely generic thing; also, we've found many metadata items have a natural tree structure, which of course is not really representable in such a flat key-value structure. Anser: Tree structure representation in relational databases (say sets of flat key-value structures) has been being used for years and is mastered. Off course this brings some query complexity for the tree reconstruction. We do not expect hierarchy to be used here, the full information (and the various hierarchies) is actually kept in the SimDM serializations (of project and protocol SimDM package mostly.). This is intended to be a simple (flat), yet informative, view of the SimDM serializations (classes mainly). In 4.1, you say "(in the sense of ivo:// id)" for the authority. We are not quite sure what you intend to do here, but we strongly suspect you do not want an authority here. A publisher typically is an Organization in VOResource, not an Authority, and an Authority can register multiple Organizations. We believe what you want to say here is: "The IVOA Identifier of the publisher of the project...". This would mean that, provided the publishers did their job right, thatSemantics Working Group ( _Mireille Louys, Alberto Accomazzi )Education Interest Group ( _Massimo Ramella, Sudhanshu Barway )Time Domain Interest Group ( _John Swinbank, Dave Morris )Data Curation & Preservation Interest Group ( Françoise Genova )Operations Interest Group ( _Tom McGlynn, Mark Taylor )Knowledge Discovery Interest Group ( Kaï Polsterer )Theory Interest Group ( _Carlos Rodrigo )When I read the introduction of section 5 (SimDAL Search) I get the impression that an unaware reader could understand that it is impossible to define views contaning properties and links to more than one different object type. For instance, it explains that, if you have a simulation that provides both the 3D structure of a star and its oscillation spectrum, you have to define two different views. This, in fact, is a missunderstanding. First, having a simulation that generates different output files for a set of given inputs is quite a common case. And having different views for each object type (output data file), while possible, would make it almost impossible that a user can make a search on metadata and get the two, or more, final files corresponding to that "experiment" (the links between different views is very inefficient). Second, it is perfectly possible and natural in SimDAL to define a view that links together all the output files for the same experiment and points to the corrresponding DataAccess services poroviding each dataset. By the way, this is not a "new idea", it's something that have been discussed and agreed before. It is just not clear in the document. In my opinion, it should be explicetely said that views containing different objecttypes can be defined. And it would be nice to give a simple example of a view schema for such a case. Answer: Thank you Carlos, we have done the update in the document.Standards and Processes Committee ( Françoise Genova)<--
|