KDD - GWS Joint Session Schedule - AI in Astronomy and Impact on IVOA Standards - IVOA Nov 2024 Interoperability Meeting

* back to main programme page *

Draft Schedule

Time: Saturday, November 16, 2024 14:00-15:30

Location: Aula Magna

Speaker Title Time Material Abstract
Andre Schaaff NLP-chatbot R&D at CDS 10'+2'   Over the past years the CDS has undertaken a long term R&D work on Natural Language Processing applied to the querying of astronomical data services. The motivation was to enable new ways of interaction, especially a chatbot, as an alternative to the traditional forms with the aim to reach query results satisfying professional astronomers.

The Virtual Observatory (VO) brought us standards like TAP, UCDs, ..., implemented in the CDS services, helping us to query our services and opening the door to query the whole VO.

We will give a quick reminder and status of this work around a chatbot. We started in 2023 to explore how to improve it with the OpenAI API. We are now forking from this initial work to study how to apply it to the improving of our services, in a wider AI use. We will give a first overview of this new R&D study.

Sebastian Trujillo Gomez ‘Spherinator + HiPSter: from the known unknowns to the unknown unknowns’ 10'+2'   Current applications of machine learning to astrophysics focus on teaching machines to perform domain-expert tasks accurately and efficiently across enormous datasets. Although essential in the big data era, this approach is limited by our own intuitions and expectations, and provides at most only answers to the ‘known unknowns’. To address this, we are developing a new conceptual framework and software tools to help astronomers maximize scientific breakthroughs by letting the machine learn unbiased interpretable representations of complex data ranging from observational surveys to simulations. Our tools automatically learn low-dimensional representations of complex objects such as galaxies in multimodal data (e.g. images, spectra, datacubes, simulated point clouds, etc.), and provide interactive explorative access to arbitrarily large datasets using a simple graphical interface. Our framework is designed to be interpretable, work seamlessly across datasets regardless of their origin, and provide a path towards discovering the ‘unknown unknowns’.
Giuseppe Riccio Integrating AI tools in data analysis frameworks: the Vera Rubin LSST and Euclid cases 10'+2'   Data analytics frameworks offer very useful solutions to connect to a large amount of huge repositories and to collect and provide a set of tools to support scientists in their research on the huge quantity of exceptional quality data produced by the ever-increasing number of sophisticated astronomical instruments. In order to have a framework able to interface with as many archives as possible and to provide a large number of tools, a very high level of standardization is needed, both for repositories and analysis methods I/O. Moreover, integrating advanced data-driven science methodologies for the automatic exploration of data is becoming mandatory to face the huge amount of available data. As part of the LSST and Euclid project, we have developed a portable and modular web application (and its “euclidized” version), designed to provide an efficient and intuitive software infrastructure to analyze data acquired and stored on their official repositories. It is able to retrieve and analyze both housekeeping and scientific data, providing standard statistical and plotting tools, as well as machine/deep learning and data mining techniques and methods. Moreover, we foreseen to integrate an LLM model to simplify some time-consuming configuration operations, currently in charge of the user.
John Abela The Computational Evolution of Human Intelligence in AI 10'+2'   The question of whether human intelligence is Turing-computable—replicable on a machine with sufficient complexity—has divided thinkers and researchers for decades. This talk explores two opposing views: either intelligence can be fully understood, quantified, and recreated through algorithms, or it encompasses qualities beyond the reach of computational methods. The recent evolution of large language models (LLMs), capable of producing nuanced, human-like responses, lends credibility to the theory that intelligence may indeed be algorithmic in nature. These models, leveraging enormous capacities, mimic aspects of human cognition, suggesting that machine replication of intelligence is within reach, at least in theory. I will examine the trajectory of AI through advances in LLMs and other architectures, highlighting how they support the hypothesis of intelligence as an emergent property of computational complexity. This perspective aligns with the Turing-computable hypothesis, suggesting that with sufficient resources and sophisticated architectures, machines may achieve levels of understanding and creativity once thought exclusive to human minds. In doing so, the discussion confronts the philosophical implications of these advancements, asking whether AI can not only emulate but embody facets of human-like intelligence.
Sara Shishehchi Leveraging Large Language Model(LLM)-based Agents with Multiple Tool Integration for Enhanced Search in the Canadian Astronomy Data Centre 10'+2'  
Searching for data, including images, using the advanced search tool on the Canadian Astronomy Data Centre (CADC) website can be difficult for users, as it requires knowledge of the ADQL language and involves multiple steps to narrow and refine search queries. The goal of this project is to leverage Large Language Models (LLMs) and autonomous agents to create a chatbot that assists users in searching for images in the CADC database using natural language. Our LLM-based agent accepts queries in English, converts them to ADQL code, and returns the results after executing the query against the database. The system is designed to handle common user errors, such as spelling mistakes, incorrect column names, and incorrect values. In such cases, the chatbot suggests a shortlist of similar but correct values that the user might have intended. The user's feedback is then collected to retrieve the correct content. This robustness was achieved by incorporating Retrieval-Augmented Generation (RAG) and semantic search tools, which verify query components with the user before execution and test them against the database.
To evaluate the performance of our system, we created a dataset of questions across different categories: standard questions, spelling errors, incorrect columns, and incorrect values. The system demonstrates 80-90% accuracy on benchmarks, which is a significant improvement over existing systems built using OpenAI ’s custom GPT, which achieved less than 20% accuracy on the same tests. Our solution streamlines the search process for CADC users, making data retrieval more efficient and accessible.

Panel:

Andre Schaaff, Sebastien Trujillo, Giuseppe Riccio, John Abela, Chenzhou Cui

Moderators:

Yihan Tao, Sara Bertocco, Jesus Salgado

Discussion on the use of AI in astronomy and its impact on IVOA standards      
Notes: TBD

* back to main programme page *

Edit | Attach | Watch | Print version | History: r9 < r8 < r7 < r6 < r5 | Backlinks | Raw View | Raw edit | More topic actions
Topic revision: r9 - 2024-11-07 - YihanTao
 
This site is powered by the TWiki collaboration platform Powered by Perl This site is powered by the TWiki collaboration platformCopyright © 2008-2024 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback