Parquet In IVOA

November 5 2024 7:00PM UTC Online meeting

22 Zoom participants

Agenda:

The purpose of the meeting is to learn about different current Parquet-related efforts currently under way and identify synergies that can be channeled into possibly new IVOA standards.

Agenda

- Around the table short description of the groups and Parquet-related projects

- Mark Taylor presentation on VOTable metadata with Parquet files

- Jos de Bruijne Parquet Compression algorithms in Parquet

- Path forward - Malta Interop

Meeting minutes:

Active groups using Parquet techonology:

  • Jeff Burke (CADC) - Parquet format for TAP
  • Mario Juric (U of Washington) - Upload and download large Ruben catalogues
  • Vandana Desai (Caltech/IPAC) - Science with large catalogues inspired by Mario's work.
  • Jos De Bruijne (ESA) - ESA GAIA DR4 - Use Parquet format (instead of the current CSV format)
  • Pierre Le Sidaner (Observatoire de Paris) - Also GAIA mission. Appreciate data access by column
  • Gregory Dubois-Felsmann (Ruben) - Ruben very heavy use of Parquet internally and externally. Catalogue releases. Rich metadata in VOTable goes beyond UCDs and UTypes
  • Trey Roby (IPAC) - Firefly read and write Parquet file. Very good file. Pushing the edge of how big the files can be.
Mark Taylor presentation on adding VOTable rich metadata in Parquet

Questions and comments on the presentation:

Grogoy D-F: VOTable metadata is important for DataLink, no objection for adding encoding, version support is a must. Valid VOTable medata must comply with the spec. Parquet is not great to work multiple datasets but VOTable should support multiple tables. What happens if there’s a conflict between VOTable metadata nd Parquet metadata.

Mark T.: Cannot have an empty data element. VOTable Resource has a flag that can be used. Main table can be of type result and other resources can have other types (medatadat).

Trey R.: How to resolve inconsistencies between Parquet and VOTable. Parquet should have the last word.

Mario J: Important to specify how to do deal with inconsistencies between VOTabe metadata and Parquet metadata

Brigitta S: Prefers Parquet support first before generalization (FITS) Example files would be useful for implementation such as astroquery - ex multiple tables with the same file.

Jos DB: Compression algs. Should start thinking about this as it's one of the core strengths of Parquet

Gregory D-F. We need to define new TAP result formats. Is there a Parquet MIME type?

A few participants stressed the importance of moving fast with the adoption of Parquet in IVOA as a lot of projects are currently under way.

Marco M: Proposed the faster route of writing an IVOA Note for VOTable and Parquet and then explore generalizing it for other types (FITS) and turn it into a standard.

Actions:

- Mark T. and Gregory F-B will start drafting the note

- Mark T. will make his presentation available so that others can comment on it - done

- Mark T. will present progress during the Apps session in Malta

- Apps WG to support the efforst: create Twiki page with useful info, organize meetings post Interop as required, etc.

Topic attachments
I Attachment History Action Size Date Who Comment
PDFpdf Bulk_download_for_DR4-public.pdf r1 manage 903.4 K 2024-11-06 - 10:01 MarkTaylor Gaia DR4 bulk download plans
PDFpdf votparquet-telecon-2024-11-05.pdf r1 manage 139.5 K 2024-11-05 - 22:37 MarkTaylor  
Edit | Attach | Watch | Print version | History: r8 < r7 < r6 < r5 < r4 | Backlinks | Raw View | Raw edit | More topic actions
Topic revision: r8 - 2024-11-06 - AdrianDamian
 
This site is powered by the TWiki collaboration platform Powered by Perl This site is powered by the TWiki collaboration platformCopyright © 2008-2024 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback