PyVO and the end User
Notes
Participants: 67
Presenter / Chair: Tess Jaffe / Tom Donaldson
Slides: https://wiki.ivoa.net/internal/IVOA/InterOpMay2020Apps/ivoa_202005_session.pdf
NAVO's experience from running 4 AAS workshops.
- Background:
- 2017 - started Python workshops at AAS for NAVO (HEASARC, IRSA, MAST, NED) archives
- Picked up PyVO in summer 2019, wanted archive agnostic queries, unlike astroquery
- Goals:
- PyVO already powerful, needs to be easier to use
PyVO workflow, as used in the AAS tutorials:
Step 1: search the registry -- because archive-agnostic.
Step 2: more info from each potential service
Step 3: access the data
There are details to this users should not have to worry about, how do we simplify the process?
Reminders for users/class participants:
- Each archive has its own back end (and validation issues)
- Each archive has its own downtime
- These are living archives: results may vary day to day as underlying data changes
Developed notebooks: http://github.com/NASA-NAVO/navo-workshop
- Quick reference for notebooks: http://nasa-navo.github.io/navo-workshop/QuickReference.html
- Walkthrough registry search, then cone, image, spectral, table searches.
- In complex queries, end up with some tension between using programmatic access and astropy pretty-printing results.
- Other notebooks (add yours)!
- CDS: https:/github.com/cds-astro/tutorials/blob/master/Notebooks simple, CDS specific
- GAVO: [WHERE?] for advanced users
Questions from Tess:
- How do we make the registry search better?
- Better metadata in general
- (James Dempsey: Advice for archive maintainers?)
- Tess: Standardize "publisher", etc within an archive
- Tom McGlynn: instrument/facility keywords have been largely unused across the IVOA. Would help with mission-based filtering
- Tim Jenness & Tom: filtering on registry; the unique keys aren't the human-readable ones; short names and titles may not be as unique in response as users expect
- Do we teach sync or async? Better async wrapper in PyVO? Advanced?
- David Shupe: verbose mode for async? We have a wrapper, but it hides useful links for debugging.
- Tom D: PyVO's encapsulation here is often VO jargon; how do we balance this
- Tips and tricks in general?
- Standards, note "size", "diameter", "radius", across IVOA search standards
- (JJ Kavelaars: would be good if they all required units, too)
- Going back and forth between astropy table rows and SIAResult objects, etc.
- Avoid the lure of Python one-liner elegance
Adrian Damian: working on VO module in astroquery that may provide something in this archive-agnostic niche at a higher level than PyVO does.https://github.com/astropy/astroquery/pull/1679
Andy Lawrence: Notebooks versus clients like TOPCAT in workflow? Good for teaching but folks are used to switching to some other client tool.
- TOPCAT and SAMP? Mark: May actually be able to do this already with astropy
- Katharina Lutz: Yes: can send data between notebooks and TOPCAT with SAMP. Astropy example works
Markus and Hendrik: GAVO workshops go three days. Starts with tutorials in TOPCAT, Aladin and Splat, then ADQL, then PyVO last. Treat PyVO as useful for larger-scale programmatic access for KDD, machine learning, etc. Results have been mixed. Some expertise in Python definitely required.
Sync vs Async: there is some difference between data centers in expectations and workflow: CADC streams sync results, so not limited. Others (Rubin, MAST, ...) have a higher result limit for async mode queries because they're expected to be larger jobs. Dave M: early in TAP, design expected all programmatic access to use async, and sync was for short browser request.
Call for community support: general idea PyVO is well designed and gaining critical mass.
- TAPService.examples (PR under review in PyVO already)
- Link TAPService.tables in RegistryResource
- UCD improvements?
- Better datalink handling and error handling?
- Recursive datalink walker. How to filter? Semantics.
- Tess: better error handling and reporting in PyVO in general
- Tim Jenness - bytestrings and VOTables, Python 2-3. Tom D: is fixed in astropy, not released yet (4.1, pending)
Running Questions
Bruce Berriman: Does PyVO handle proprietary data?
- Christine: PyVO does support authentication. Can log in via PyVO, service side handles authorization/access. CADC is using this for metadata and data filtering already.
Mark Taylor: User feedback?
- Tess: Working well within the workshops, unsure how much participants are taking home and using for research later
Christine: 10k foot view: spectrum from nuts and bolts to simplified astroquery encapsulation, where does PyVO in general fit, and where do we want to wrap it in/for notebooks.
- Tess: We moved away from doing this within astroquery, so do we do this within PyVO (as now) or is there a third layer?
JJ: where does the workshop/tutorial contents fit with archive-specific astroquery howto? CADC's astroquery module is built on top of PyVO, others aren't.
- Tess: we're trying to teach archive-agnostic research with these, hence the pivot to PyVO
Anais Oberto: method for parallelizing queries? Keeping track of the async jobs?
- Christine: some wrapper with async job manager/limiter, within PyVO
- Tom: Server-side solutions like VizieR's (MOC? based filtering)