This page is for discussions of the Simulation Databaase (SimDB) data model (SimDB/DM).
In SimDB the data model is represented by a UML diagram, which is stored as an XMI file in the volute GoogleCode SVN repository.
The latest version of the model can be downloaded from
http://code.google.com/p/volute/source/browse/#svn/trunk/projects/theory/snapdm/specification/uml. It is the file named SimDB_DM.xml.
Authors of issues should indicate which revision of the data model they comment on. This number is indicated in the Rev column of the listing in the above mentioned URL.
Alternatives to MagicDraw Community Edition
o far our models were created using
MagicDraw CE 12.1.
We may want to upgrade to 16.6 (the latest), except that (2009-10-18) that edition is not available. Also, when trying out 16.5 I noticed that the XMI that is generated is different.
I started translating the xmi2intermediate.xsl style sheet to include this but have not tested it rigorously. Also, this was done in
VO-URP only.
We may want to look at other modelling frameworks, that are more clearly non-commercial. For example
ArgoUML, or
EclipseUML?
They should be XMI compliant, or at least allow easy writing of xmi2intermediate.
Franck and Benjamin have brought up again the possibility of composite protocols (and experiments?). These would likely be aggregations of existing, registered protocols, that are used in combination in a single experiment.
We need use cases for these if we want to reintriduce them iso the "workaround" of simply creating an appropriate new protocol.
And it should not already be covered by the
SimDB:Project, or by the intrinsic "workflow" support in the model itself (i.e. with one experiment already being able to use the result of another)!
For example one might argue that the composite protocol should still correspond to a single executable for example, a single experiment
and not a composite experiment, supposedly consisting of an aggregation of experiments, each run according to the appropriate protocol in the composite protocol, and each of them separately registered; at least that was how it was modelled in the past.
But IF the individual protocols are identifiable in the composite protocol, and the corresponding experiments are also identifiable, have their individual parameter settings, targets etc and have their own results, should this be considered a composite protocol or "simply" a number of individual experiments run one after another, grouped together in a project?
So IF a composite protocol were introduced, it should NOT give rise to an aggregation of experiments. But this still implies we need a new subclass of experiment as the current ones, post-processing and simulation, must have a protocol of a corresponding type. We might still call this composite experiment, but it should be understood that this should not be componentized into sub-experiments.
Which then introduces the following issue: if a simulation is part of the composite protocol and physics is therefore intorduced, according to the definition of simulator ("a protocol that models physical processes") it should be a simulator (as well?).
GL 2009-10-18: To me this is one reason that we should NOT introduce composite protocol. Use the model itself, with Project if necessary to model this case, or introduce a new protocol, which likely is more accurate anyway.
Whilst creating an XML document for the Gadget2 simulation code, I had to repeatedly define X/Y/Z/VX/VY/VZ/Mass properties for each of the 6 possible representationobjecttype-s.
This suggests introduing objecttype inheritance in the model.
Laurent suggests using
ChildObject, but that should(?) be used for composition hierarchies.
I suggest adding an "extends" reference from
ObjectType to
ObjectType.
We should then also have as rule (currently this can not be defined in UML) that an
ExperimentProperty.property must be a member of the property collection of the
ExperimentProperty.container.type, including or one of its base classes.
Parameter setting (r1059)
Dealing with values where we do not know in advance what the datatype should be is problematic.
For example consider the
ParameterSetting class.
For a given parameter this should provide a value.
But the parameter (an
SimDB:simdb/protocol/InputParameter) defines its own dataype, which may be numeric, but may also be a string, or a datetime.
So how to define a value, what should be the datatype of the value attribute in a
ParameterSetting?
In principle we could use a 'string' always.
For numeric parameters however it would be nice if we can query using between or <=, >= etc. For that though the datatype must also be numeric.
Currently we have solved this by introducng 2 subclasses of the abstract base
ParameterSetting:
NumericParameterSetting and
GenericParameterSetting.
This is very awkward.
An alternative solution would be to have 2 attributes, one of type real and one of type string, on the
ParameterSetting class.
Named eg: r_value and s_value.
One of these should be given a value only, though one may allow both. (NB this is a case where XML Schema's
might help.).
This is still not ideal, but maybe somewhat less awkward.
Logical profile (r1059)
Current UML Profile is build for analysis models.
Hence the data types are not intended for use in software.
E.g. no distinction between float/double precision (reall*4/real*8),
or short/long/longlong integeres.
Should we create an alternative profile with datatypes more directly mappable to software types?
AMR simulations & statistics issues (r902)
1. how to indicate the refinement conditions per AMR level ?
2. how to indicate the number of cells per AMR level ?
3. how to deal with functions like statistics (average velocity ...) computed per density threshold (or any other property) ? it is like a 2D characterisation (x, y) where x & y are both properties.
See the following document to have a detailled description of that point :
Various issues raised by Rick Wagner (r902?)
in this email.
Follow-Up
During a telecon with Gerard, we have identified a few use cases that can test the data model. Hopefully we can adjust the model to encompass these.
Issue: Utypes
This one is pretty straightforward. SimDAP needs to map the elements of its responses to the simulation data model. The way to do that is through the Utypes. Hence, the Utypes defined from the XMI file need to fit any standards defined for Utypes.
Issue: Protocol types
While simulators are (almost) clearly defined, initial conditions generators, post-processing and analysis tools are not.
- Are initial conditions generators simulators, special cases?
- Is using using a tool like Cloudy to add fields (properties) post-processing, or another simulation?
- Are there specific sets of attributes, such as the physics used, that helps us to differentiate between these programs?
My suggestion is to remove the subclasses of Protocol from the model, and use a "type" attribute, with a choice of values. There are several common types of applications used in computational astrophysics that can initialize the list. If it turns out to be insufficient, we can add to it, or make it repeatable.
Here are my suggestions to start the list:
- simulator
- initial conditions generator
- halo finder
- extraction tool
- projection tool
- spectrum generator
- analysis tool
- post-processing tool
Issue: Snapshot types
A result of trying to encompass a larger variety of protocols and experiments is that the data may not fit the current Snapshot model. For example, a Snapshot has "spatial size" attribute, to provide a sense of scale. However, the output of a spectrum generator is a set of SEDs. But, the output of an initial conditions generator is almost equivalent to the output of a simulation code. The difference between post-processing and analysis tools, and simulation and IC generators, is that simulation results contain a complete representation of what's being simulated.
The choices here are to either make the uncommon attributes on Snapshot optional, or to have both a Results or Output class, and Snapshot. I feel that the latter is preferable, since it allows us to easily distinguish which data held or initialized the state of the simulation.
-- RickWagner - 19 Mar 2009
Examples of Protocols
For discussion purposes it will be good to have a list of real protocols, i.e. simulation and post-processing codes and algorithms.
Simulators:
Semi-analytical galaxy formation codes
- L-Galaxies (MPA semi-analytical (SAM) galaxy formation code)
- GalForm (Durham SAM code)
- GalICS (Horizon SAM)
- MORGABNA (monaco et al)
- ...
Pure post-processors:
cluster finders
- FOF cluster finders (various implementations)
- SUBFIND cluster finder (Springel etal 2001, ADS)
- ...
Others?
-- Gerard Lemson May 12 (extracted from IVOA.IVOATheorySimDBDM)