Parquet In IVOA
November 5 2024 7:00PM UTC Online meeting
22 Zoom participants
Agenda:
The purpose of the meeting is to learn about different current Parquet-related efforts currently under way and identify synergies that can be channeled into possibly new IVOA standards.
Agenda
- Around the table short description of the groups and Parquet-related projects
- Mark Taylor
presentation on VOTable metadata with Parquet files
- Jos de Bruijne Parquet Compression algorithms in Parquet
- Path forward - Malta Interop
Meeting minutes:
Active groups using Parquet techonology:
- Jeff Burke (CADC) - Parquet format for TAP
- Mario Juric (U of Washington) - Upload and download large Ruben catalogues
- Vandana Desai (Caltech/IPAC) - Science with large catalogues inspired by Mario's work.
- Jos De Bruijne (ESA) - ESA GAIA DR4 - Use Parquet format (instead of the current CSV format)
- Pierre Le Sidaner (Observatoire de Paris) - Also GAIA mission. Appreciate data access by column
- Gregory Dubois-Felsmann (Ruben) - Ruben very heavy use of Parquet internally and externally. Catalogue releases. Rich metadata in VOTable goes beyond UCDs and UTypes
- Trey Roby (IPAC) - Firefly read and write Parquet file. Very good file. Pushing the edge of how big the files can be.
Mark Taylor presentation on adding VOTable rich metadata in Parquet
Questions and comments on the presentation:
Grogoy D-F: VOTable metadata is important for
DataLink, no objection for adding encoding, version support is a must. Valid VOTable medata must comply with the spec. Parquet is not great to work multiple datasets but VOTable should support multiple tables. What happens if there’s a conflict between VOTable metadata nd Parquet metadata.
Mark T.: Cannot have an empty data element. VOTable Resource has a flag that can be used. Main table can be of type result and other resources can have other types (medatadat).
Trey R.: How to resolve inconsistencies between Parquet and VOTable. Parquet should have the last word.
Mario J: Important to specify how to do deal with inconsistencies between VOTabe metadata and Parquet metadata
Brigitta S: Prefers Parquet support first before generalization (FITS) Example files would be useful for implementation such as astroquery - ex multiple tables with the same file.
Jos DB: Compression algs. Should start thinking about this as it's one of the core strengths of Parquet
Gregory D-F. We need to define new TAP result formats. Is there a Parquet MIME type?
A few participants stressed the importance of moving fast with the adoption of Parquet in IVOA as a lot of projects are currently under way.
Marco M: Proposed the faster route of writing an IVOA Note for VOTable and Parquet and then explore generalizing it for other types (FITS) and turn it into a standard.
Actions:
- Mark T. and Gregory F-B will start drafting the note
- Mark T. will make his presentation available so that others can comment on it -
done
- Mark T. will present progress during the Apps session in Malta
- Apps WG to support the efforst: create Twiki page with useful info, organize meetings post Interop as required, etc.