(1 vs. 10) InterOpMay2024KD < Main

Revision 102024-05-21 - KaiLarsPolsterer

META TOPICPARENT	name="WebHome"

Knowledge Discovery

Time: Wednesday May 22, 2024 11:00-12:30 Australian Eastern Standard Time

Speaker	Title	Time	Abstract	Material
Yihan Tao	Greetings and Introduction	5'		pdf
Alberto Accomazzi	BiblioPile: Building a Dataset to Support AI-enabled Bibliography Curation efforts	15'+5'	A well-established way to assess the scientific impact of an observational facility in astronomy is the quantitative analysis of the studies published in the literature which have made use of the data taken by the facility. A requirement of such analysis is the creation of bibliographies which annotate and link data products with the literature, thus providing a way to use bibliometrics as an impact measure for the underlying data. An automated assistant able to emulate some of the associated activities would provide a valuable contribution to the human effort involved. LLMs have shown flexibility in interpreting and classifying scientific articles which are the basis for this curation activity. They have also been successfully used for information extraction tasks, which would help identify the specific datasets mentioned in the papers. In this talk I will describe our effort to create the BiblioPile, a contributed dataset consisting of open access fulltext papers and annotated bibliography from institutions that maintain them in order to help train AI/ML bibliographic annotation pipelines.	pdf [gdoc]
Yan Shao	Generative Named Entity Normalization for Astronomical Facilities	15'+5'	Named entity normalization for astronomical facilities is crucial in the related academic research. Unlike the majority of the previous work, we model named entity normalization as a sequence generation problem via utilizing large language models, without assuming a comprehensive set of predefined normalized forms for any entities. Four entity normalization scenarios that are likely to occur in real-world application are discussed specifically, depending on whether the explicit normalization rules as well as the corresponding annotated instances are available. Moreover, we propose respective generative normalization methods and evaluate on datasets compiled from the standard telescope name lists maintained by the American Astronomical Society (AAS) and the Astrophysics Data System (ADS). The empirical findings demonstrate that the analytical, inductive, and generative capabilities of LLMs empower generative entity normalization to achieve commendable performances, even under very stringent conditions. The generative normalization effectively remedies the shortcomings of the retrieval-based methods.	pdf
Kai Polsterer	Spherinator & HIPSter & Jasmine	15'+5'	Simulations are the best, and often the only, approximation to experimental laboratories in Astrophysics. However, the complexity and richness of their outputs severely limits the interpretability of their predictions. We describe a new conceptual approach to obtaining useful scientific insights from a broad range of astrophysical simulations. These methods can be applied to state-of-the-art simulations and will be essential to automate the data exploration and analysis of the next-generation exascale simulations and the extreme data challenges they will present. Our concept is based on applying the latest advances in unsupervised deep learning algorithms to efficiently represent the multidimensional datasets produced by Astrophysics simulations and to learn compact but accurate representations of the data in a low-dimensional manifold that naturally describes the data in an optimal feature space. The data can seemingly be projected onto this latent space for interactive inspection, visual interpretation, and quantitative analysis, including the option of deriving symbolic expressions to build interpretable models. We present a working prototype of the pipeline using an autoencoder trained on galaxy images from SDSS (or equivalently simulated galaxies) as well as the Illustris simulations, to produce a natural ‘Hubble tuning fork’ similarity space that can be visualized interactively on the surface of a sphere by exploiting the power of HiPS tilings in AladinLite. Besides Spherinator and HIPSter to do this spherical projections, we are working on Jasmine a tool to explore the rich data from simulations in detail.	pdf

Changed:

<
<

Panel + audience
(Yihan Tao, Kai Polstererk, Rafael Martinez Galarza, Alberto Accomazzi)

Panel-led discussion

25'

Seeding topics for discussion

1. How can state-of-the-art AI technologies, such as LLMs, fundation models and agents enhance the VO?

2. What are the potential applications of these AI technologies within the VO framework?

3. What are the best practices and strategies for integrating AI agents and models with VO tools and science platforms that can help user efficiently access to and analyse astronomical data? What are the challenges?

>
>

Panel + audience
(Yihan Tao, Kai Polsterer, Rafael Martinez Galarza, Alberto Accomazzi)

Panel-led discussion

25'

Seeding topics for discussion

1. How can state-of-the-art AI technologies, such as LLMs, fundation models and agents enhance the VO?

2. What are the potential applications of these AI technologies within the VO framework?

3. What are the best practices and strategies for integrating AI agents and models with VO tools and science platforms that can help user efficiently access to and analyse astronomical data? What are the challenges?

Moderator: Raffaele D'Abrusco, Notetaker: TBD, Etherpad link

Revision 92024-05-21 - AlbertoAccomazzi

META TOPICPARENT	name="WebHome"

Knowledge Discovery

Time: Wednesday May 22, 2024 11:00-12:30 Australian Eastern Standard Time

Speaker	Title	Time	Abstract	Material
Yihan Tao	Greetings and Introduction	5'		pdf

Changed:

<
<

Alberto Accomazzi

BiblioPile: Building a Dataset to Support AI-enabled Bibliography Curation efforts

15'+5'

A well-established way to assess the scientific impact of an observational facility in astronomy is the quantitative analysis of the studies published in the literature which have made use of the data taken by the facility. A requirement of such analysis is the creation of bibliographies which annotate and link data products with the literature, thus providing a way to use bibliometrics as an impact measure for the underlying data. An automated assistant able to emulate some of the associated activities would provide a valuable contribution to the human effort involved. LLMs have shown flexibility in interpreting and classifying scientific articles which are the basis for this curation activity. They have also been successfully used for information extraction tasks, which would help identify the specific datasets mentioned in the papers. In this talk I will describe our effort to create the BiblioPile, a contributed dataset consisting of open access fulltext papers and annotated bibliography from institutions that maintain them in order to help train AI/ML bibliographic annotation pipelines.

pdf

>
>

Alberto Accomazzi

BiblioPile: Building a Dataset to Support AI-enabled Bibliography Curation efforts

15'+5'

A well-established way to assess the scientific impact of an observational facility in astronomy is the quantitative analysis of the studies published in the literature which have made use of the data taken by the facility. A requirement of such analysis is the creation of bibliographies which annotate and link data products with the literature, thus providing a way to use bibliometrics as an impact measure for the underlying data. An automated assistant able to emulate some of the associated activities would provide a valuable contribution to the human effort involved. LLMs have shown flexibility in interpreting and classifying scientific articles which are the basis for this curation activity. They have also been successfully used for information extraction tasks, which would help identify the specific datasets mentioned in the papers. In this talk I will describe our effort to create the BiblioPile, a contributed dataset consisting of open access fulltext papers and annotated bibliography from institutions that maintain them in order to help train AI/ML bibliographic annotation pipelines.

pdf
[gdoc]

Yan Shao

Generative Named Entity Normalization for Astronomical Facilities

15'+5'

Named entity normalization for astronomical facilities is crucial in the related academic research. Unlike the majority of the previous work, we model named entity normalization as a sequence generation problem via utilizing large language models, without assuming a comprehensive set of predefined normalized forms for any entities. Four entity normalization scenarios that are likely to occur in real-world application are discussed specifically, depending on whether the explicit normalization rules as well as the corresponding annotated instances are available. Moreover, we propose respective generative normalization methods and evaluate on datasets compiled from the standard telescope name lists maintained by the American Astronomical Society (AAS) and the Astrophysics Data System (ADS). The empirical findings demonstrate that the analytical, inductive, and generative capabilities of LLMs empower generative entity normalization to achieve commendable performances, even under very stringent conditions. The generative normalization effectively remedies the shortcomings of the retrieval-based methods.

pdf

Kai Polsterer

Spherinator & HIPSter & Jasmine

15'+5'

Simulations are the best, and often the only, approximation to experimental laboratories in Astrophysics. However, the complexity and richness of their outputs severely limits the interpretability of their predictions. We describe a new conceptual approach to obtaining useful scientific insights from a broad range of astrophysical simulations. These methods can be applied to state-of-the-art simulations and will be essential to automate the data exploration and analysis of the next-generation exascale simulations and the extreme data challenges they will present. Our concept is based on applying the latest advances in unsupervised deep learning algorithms to efficiently represent the multidimensional datasets produced by Astrophysics simulations and to learn compact but accurate representations of the data in a low-dimensional manifold that naturally describes the data in an optimal feature space. The data can seemingly be projected onto this latent space for interactive inspection, visual interpretation, and quantitative analysis, including the option of deriving symbolic expressions to build interpretable models. We present a working prototype of the pipeline using an autoencoder trained on galaxy images from SDSS (or equivalently simulated galaxies) as well as the Illustris simulations, to produce a natural ‘Hubble tuning fork’ similarity space that can be visualized interactively on the surface of a sphere by exploiting the power of HiPS tilings in AladinLite. Besides Spherinator and HIPSter to do this spherical projections, we are working on Jasmine a tool to explore the rich data from simulations in detail.

pdf

Panel + audience
(Yihan Tao, Kai Polstererk, Rafael Martinez Galarza, Alberto Accomazzi)

Panel-led discussion

25'

Seeding topics for discussion

1. How can state-of-the-art AI technologies, such as LLMs, fundation models and agents enhance the VO?

2. What are the potential applications of these AI technologies within the VO framework?

3. What are the best practices and strategies for integrating AI agents and models with VO tools and science platforms that can help user efficiently access to and analyse astronomical data? What are the challenges?

Moderator: Raffaele D'Abrusco, Notetaker: TBD, Etherpad link

Revision 82024-05-21 - RaffaeleDAbrusco

META TOPICPARENT	name="WebHome"

Knowledge Discovery

Time: Wednesday May 22, 2024 11:00-12:30 Australian Eastern Standard Time

Speaker	Title	Time	Abstract	Material
Yihan Tao	Greetings and Introduction	5'		pdf
Alberto Accomazzi	BiblioPile: Building a Dataset to Support AI-enabled Bibliography Curation efforts	15'+5'	A well-established way to assess the scientific impact of an observational facility in astronomy is the quantitative analysis of the studies published in the literature which have made use of the data taken by the facility. A requirement of such analysis is the creation of bibliographies which annotate and link data products with the literature, thus providing a way to use bibliometrics as an impact measure for the underlying data. An automated assistant able to emulate some of the associated activities would provide a valuable contribution to the human effort involved. LLMs have shown flexibility in interpreting and classifying scientific articles which are the basis for this curation activity. They have also been successfully used for information extraction tasks, which would help identify the specific datasets mentioned in the papers. In this talk I will describe our effort to create the BiblioPile, a contributed dataset consisting of open access fulltext papers and annotated bibliography from institutions that maintain them in order to help train AI/ML bibliographic annotation pipelines.	pdf
Yan Shao	Generative Named Entity Normalization for Astronomical Facilities	15'+5'	Named entity normalization for astronomical facilities is crucial in the related academic research. Unlike the majority of the previous work, we model named entity normalization as a sequence generation problem via utilizing large language models, without assuming a comprehensive set of predefined normalized forms for any entities. Four entity normalization scenarios that are likely to occur in real-world application are discussed specifically, depending on whether the explicit normalization rules as well as the corresponding annotated instances are available. Moreover, we propose respective generative normalization methods and evaluate on datasets compiled from the standard telescope name lists maintained by the American Astronomical Society (AAS) and the Astrophysics Data System (ADS). The empirical findings demonstrate that the analytical, inductive, and generative capabilities of LLMs empower generative entity normalization to achieve commendable performances, even under very stringent conditions. The generative normalization effectively remedies the shortcomings of the retrieval-based methods.	pdf
Kai Polsterer	Spherinator & HIPSter & Jasmine	15'+5'	Simulations are the best, and often the only, approximation to experimental laboratories in Astrophysics. However, the complexity and richness of their outputs severely limits the interpretability of their predictions. We describe a new conceptual approach to obtaining useful scientific insights from a broad range of astrophysical simulations. These methods can be applied to state-of-the-art simulations and will be essential to automate the data exploration and analysis of the next-generation exascale simulations and the extreme data challenges they will present. Our concept is based on applying the latest advances in unsupervised deep learning algorithms to efficiently represent the multidimensional datasets produced by Astrophysics simulations and to learn compact but accurate representations of the data in a low-dimensional manifold that naturally describes the data in an optimal feature space. The data can seemingly be projected onto this latent space for interactive inspection, visual interpretation, and quantitative analysis, including the option of deriving symbolic expressions to build interpretable models. We present a working prototype of the pipeline using an autoencoder trained on galaxy images from SDSS (or equivalently simulated galaxies) as well as the Illustris simulations, to produce a natural ‘Hubble tuning fork’ similarity space that can be visualized interactively on the surface of a sphere by exploiting the power of HiPS tilings in AladinLite. Besides Spherinator and HIPSter to do this spherical projections, we are working on Jasmine a tool to explore the rich data from simulations in detail.	pdf
Panel + audience (Yihan Tao, Kai Polstererk, Rafael Martinez Galarza, Alberto Accomazzi)	Panel-led discussion	25'	Seeding topics for discussion 1. How can state-of-the-art AI technologies, such as LLMs, fundation models and agents enhance the VO? 2. What are the potential applications of these AI technologies within the VO framework? 3. What are the best practices and strategies for integrating AI agents and models with VO tools and science platforms that can help user efficiently access to and analyse astronomical data? What are the challenges?

Moderator: Raffaele D'Abrusco, Notetaker: TBD, Etherpad link

Deleted:

<
<

Revision 72024-05-21 - YihanTao

META TOPICPARENT	name="WebHome"

Knowledge Discovery

Time: Wednesday May 22, 2024 11:00-12:30 Australian Eastern Standard Time

Speaker	Title	Time	Abstract	Material

Changed:

<
<

Yihan Tao

Greetings and Introduction

5'

pdf

>
>

Yihan Tao

Greetings and Introduction

5'

pdf

Alberto Accomazzi	BiblioPile: Building a Dataset to Support AI-enabled Bibliography Curation efforts	15'+5'	A well-established way to assess the scientific impact of an observational facility in astronomy is the quantitative analysis of the studies published in the literature which have made use of the data taken by the facility. A requirement of such analysis is the creation of bibliographies which annotate and link data products with the literature, thus providing a way to use bibliometrics as an impact measure for the underlying data. An automated assistant able to emulate some of the associated activities would provide a valuable contribution to the human effort involved. LLMs have shown flexibility in interpreting and classifying scientific articles which are the basis for this curation activity. They have also been successfully used for information extraction tasks, which would help identify the specific datasets mentioned in the papers. In this talk I will describe our effort to create the BiblioPile, a contributed dataset consisting of open access fulltext papers and annotated bibliography from institutions that maintain them in order to help train AI/ML bibliographic annotation pipelines.	pdf
Yan Shao	Generative Named Entity Normalization for Astronomical Facilities	15'+5'	Named entity normalization for astronomical facilities is crucial in the related academic research. Unlike the majority of the previous work, we model named entity normalization as a sequence generation problem via utilizing large language models, without assuming a comprehensive set of predefined normalized forms for any entities. Four entity normalization scenarios that are likely to occur in real-world application are discussed specifically, depending on whether the explicit normalization rules as well as the corresponding annotated instances are available. Moreover, we propose respective generative normalization methods and evaluate on datasets compiled from the standard telescope name lists maintained by the American Astronomical Society (AAS) and the Astrophysics Data System (ADS). The empirical findings demonstrate that the analytical, inductive, and generative capabilities of LLMs empower generative entity normalization to achieve commendable performances, even under very stringent conditions. The generative normalization effectively remedies the shortcomings of the retrieval-based methods.	pdf
Kai Polsterer	Spherinator & HIPSter & Jasmine	15'+5'	Simulations are the best, and often the only, approximation to experimental laboratories in Astrophysics. However, the complexity and richness of their outputs severely limits the interpretability of their predictions. We describe a new conceptual approach to obtaining useful scientific insights from a broad range of astrophysical simulations. These methods can be applied to state-of-the-art simulations and will be essential to automate the data exploration and analysis of the next-generation exascale simulations and the extreme data challenges they will present. Our concept is based on applying the latest advances in unsupervised deep learning algorithms to efficiently represent the multidimensional datasets produced by Astrophysics simulations and to learn compact but accurate representations of the data in a low-dimensional manifold that naturally describes the data in an optimal feature space. The data can seemingly be projected onto this latent space for interactive inspection, visual interpretation, and quantitative analysis, including the option of deriving symbolic expressions to build interpretable models. We present a working prototype of the pipeline using an autoencoder trained on galaxy images from SDSS (or equivalently simulated galaxies) as well as the Illustris simulations, to produce a natural ‘Hubble tuning fork’ similarity space that can be visualized interactively on the surface of a sphere by exploiting the power of HiPS tilings in AladinLite. Besides Spherinator and HIPSter to do this spherical projections, we are working on Jasmine a tool to explore the rich data from simulations in detail.	pdf
Panel + audience (Yihan Tao, Kai Polstererk, Rafael Martinez Galarza, Alberto Accomazzi)	Panel-led discussion	25'	Seeding topics for discussion 1. How can state-of-the-art AI technologies, such as LLMs, fundation models and agents enhance the VO? 2. What are the potential applications of these AI technologies within the VO framework? 3. What are the best practices and strategies for integrating AI agents and models with VO tools and science platforms that can help user efficiently access to and analyse astronomical data? What are the challenges?

Moderator: Raffaele D'Abrusco, Notetaker: TBD, Etherpad link

Revision 62024-05-20 - KaiLarsPolsterer

META TOPICPARENT	name="WebHome"

Knowledge Discovery

Time: Wednesday May 22, 2024 11:00-12:30 Australian Eastern Standard Time

Speaker	Title	Time	Abstract	Material
Yihan Tao	Greetings and Introduction	5'		pdf
Alberto Accomazzi	BiblioPile: Building a Dataset to Support AI-enabled Bibliography Curation efforts	15'+5'	A well-established way to assess the scientific impact of an observational facility in astronomy is the quantitative analysis of the studies published in the literature which have made use of the data taken by the facility. A requirement of such analysis is the creation of bibliographies which annotate and link data products with the literature, thus providing a way to use bibliometrics as an impact measure for the underlying data. An automated assistant able to emulate some of the associated activities would provide a valuable contribution to the human effort involved. LLMs have shown flexibility in interpreting and classifying scientific articles which are the basis for this curation activity. They have also been successfully used for information extraction tasks, which would help identify the specific datasets mentioned in the papers. In this talk I will describe our effort to create the BiblioPile, a contributed dataset consisting of open access fulltext papers and annotated bibliography from institutions that maintain them in order to help train AI/ML bibliographic annotation pipelines.	pdf
Yan Shao	Generative Named Entity Normalization for Astronomical Facilities	15'+5'	Named entity normalization for astronomical facilities is crucial in the related academic research. Unlike the majority of the previous work, we model named entity normalization as a sequence generation problem via utilizing large language models, without assuming a comprehensive set of predefined normalized forms for any entities. Four entity normalization scenarios that are likely to occur in real-world application are discussed specifically, depending on whether the explicit normalization rules as well as the corresponding annotated instances are available. Moreover, we propose respective generative normalization methods and evaluate on datasets compiled from the standard telescope name lists maintained by the American Astronomical Society (AAS) and the Astrophysics Data System (ADS). The empirical findings demonstrate that the analytical, inductive, and generative capabilities of LLMs empower generative entity normalization to achieve commendable performances, even under very stringent conditions. The generative normalization effectively remedies the shortcomings of the retrieval-based methods.	pdf

Changed:

<
<

Kai Polsterer

Spherinator & HIPSter & Jasmine

15'+5'

Simulations are the best, and often the only, approximation to experimental laboratories in Astrophysics. However, the complexity and richness of their outputs severely limits the interpretability of their predictions. We describe a new conceptual approach to obtaining useful scientific insights from a broad range of astrophysical simulations. These methods can be applied to state-of-the-art simulations and will be essential to automate the data exploration and analysis of the next-generation exascale simulations and the extreme data challenges they will present. Our concept is based on applying the latest advances in unsupervised deep learning algorithms to efficiently represent the multidimensional datasets produced by Astrophysics simulations and to learn compact but accurate representations of the data in a low-dimensional manifold that naturally describes the data in an optimal feature space. The data can seemingly be projected onto this latent space for interactive inspection, visual interpretation, and quantitative analysis, including the option of deriving symbolic expressions to build interpretable models. We present a working prototype of the pipeline using an autoencoder trained on galaxy images from SDSS (or equivalently simulated galaxies) as well as the Illustris simulations, to produce a natural ‘Hubble tuning fork’ similarity space that can be visualized interactively on the surface of a sphere by exploiting the power of HiPS tilings in AladinLite.

pdf

>
>

Kai Polsterer

Spherinator & HIPSter & Jasmine

15'+5'

Simulations are the best, and often the only, approximation to experimental laboratories in Astrophysics. However, the complexity and richness of their outputs severely limits the interpretability of their predictions. We describe a new conceptual approach to obtaining useful scientific insights from a broad range of astrophysical simulations. These methods can be applied to state-of-the-art simulations and will be essential to automate the data exploration and analysis of the next-generation exascale simulations and the extreme data challenges they will present. Our concept is based on applying the latest advances in unsupervised deep learning algorithms to efficiently represent the multidimensional datasets produced by Astrophysics simulations and to learn compact but accurate representations of the data in a low-dimensional manifold that naturally describes the data in an optimal feature space. The data can seemingly be projected onto this latent space for interactive inspection, visual interpretation, and quantitative analysis, including the option of deriving symbolic expressions to build interpretable models. We present a working prototype of the pipeline using an autoencoder trained on galaxy images from SDSS (or equivalently simulated galaxies) as well as the Illustris simulations, to produce a natural ‘Hubble tuning fork’ similarity space that can be visualized interactively on the surface of a sphere by exploiting the power of HiPS tilings in AladinLite. Besides Spherinator and HIPSter to do this spherical projections, we are working on Jasmine a tool to explore the rich data from simulations in detail.

pdf

Panel + audience
(Yihan Tao, Kai Polstererk, Rafael Martinez Galarza, Alberto Accomazzi)

Panel-led discussion

25'

Seeding topics for discussion

1. How can state-of-the-art AI technologies, such as LLMs, fundation models and agents enhance the VO?

2. What are the potential applications of these AI technologies within the VO framework?

3. What are the best practices and strategies for integrating AI agents and models with VO tools and science platforms that can help user efficiently access to and analyse astronomical data? What are the challenges?

Moderator: Raffaele D'Abrusco, Notetaker: TBD, Etherpad link

Added:

>
>

Revision 52024-05-20 - RaffaeleDAbrusco

META TOPICPARENT	name="WebHome"

Knowledge Discovery

Time: Wednesday May 22, 2024 11:00-12:30 Australian Eastern Standard Time

Speaker	Title	Time	Abstract	Material
Yihan Tao	Greetings and Introduction	5'		pdf
Alberto Accomazzi	BiblioPile: Building a Dataset to Support AI-enabled Bibliography Curation efforts	15'+5'	A well-established way to assess the scientific impact of an observational facility in astronomy is the quantitative analysis of the studies published in the literature which have made use of the data taken by the facility. A requirement of such analysis is the creation of bibliographies which annotate and link data products with the literature, thus providing a way to use bibliometrics as an impact measure for the underlying data. An automated assistant able to emulate some of the associated activities would provide a valuable contribution to the human effort involved. LLMs have shown flexibility in interpreting and classifying scientific articles which are the basis for this curation activity. They have also been successfully used for information extraction tasks, which would help identify the specific datasets mentioned in the papers. In this talk I will describe our effort to create the BiblioPile, a contributed dataset consisting of open access fulltext papers and annotated bibliography from institutions that maintain them in order to help train AI/ML bibliographic annotation pipelines.	pdf
Yan Shao	Generative Named Entity Normalization for Astronomical Facilities	15'+5'	Named entity normalization for astronomical facilities is crucial in the related academic research. Unlike the majority of the previous work, we model named entity normalization as a sequence generation problem via utilizing large language models, without assuming a comprehensive set of predefined normalized forms for any entities. Four entity normalization scenarios that are likely to occur in real-world application are discussed specifically, depending on whether the explicit normalization rules as well as the corresponding annotated instances are available. Moreover, we propose respective generative normalization methods and evaluate on datasets compiled from the standard telescope name lists maintained by the American Astronomical Society (AAS) and the Astrophysics Data System (ADS). The empirical findings demonstrate that the analytical, inductive, and generative capabilities of LLMs empower generative entity normalization to achieve commendable performances, even under very stringent conditions. The generative normalization effectively remedies the shortcomings of the retrieval-based methods.	pdf
Kai Polsterer	Spherinator & HIPSter & Jasmine	15'+5'	Simulations are the best, and often the only, approximation to experimental laboratories in Astrophysics. However, the complexity and richness of their outputs severely limits the interpretability of their predictions. We describe a new conceptual approach to obtaining useful scientific insights from a broad range of astrophysical simulations. These methods can be applied to state-of-the-art simulations and will be essential to automate the data exploration and analysis of the next-generation exascale simulations and the extreme data challenges they will present. Our concept is based on applying the latest advances in unsupervised deep learning algorithms to efficiently represent the multidimensional datasets produced by Astrophysics simulations and to learn compact but accurate representations of the data in a low-dimensional manifold that naturally describes the data in an optimal feature space. The data can seemingly be projected onto this latent space for interactive inspection, visual interpretation, and quantitative analysis, including the option of deriving symbolic expressions to build interpretable models. We present a working prototype of the pipeline using an autoencoder trained on galaxy images from SDSS (or equivalently simulated galaxies) as well as the Illustris simulations, to produce a natural ‘Hubble tuning fork’ similarity space that can be visualized interactively on the surface of a sphere by exploiting the power of HiPS tilings in AladinLite.	pdf

Changed:

<
<

Panel + audience
(Yihan Tao, Kai Polstererk, Rafael Martinez Galarza, Alberto Accomazzi)

Panel-led discussion

30'

Seeding topics for discussion

1. How can state-of-the-art AI technologies, such as LLMs, fundation models and agents enhance the VO?

2. What are the potential applications of these AI technologies within the VO framework?

3. What are the best practices and strategies for integrating AI agents and models with VO tools and science platforms that can help user efficiently access to and analyse astronomical data? What are the challenges?

>
>

Panel + audience
(Yihan Tao, Kai Polstererk, Rafael Martinez Galarza, Alberto Accomazzi)

Panel-led discussion

25'

Seeding topics for discussion

1. How can state-of-the-art AI technologies, such as LLMs, fundation models and agents enhance the VO?

2. What are the potential applications of these AI technologies within the VO framework?

3. What are the best practices and strategies for integrating AI agents and models with VO tools and science platforms that can help user efficiently access to and analyse astronomical data? What are the challenges?

Moderator: Raffaele D'Abrusco, Notetaker: TBD, Etherpad link

Revision 42024-05-20 - KaiLarsPolsterer

META TOPICPARENT	name="WebHome"

Knowledge Discovery

Time: Wednesday May 22, 2024 11:00-12:30 Australian Eastern Standard Time

Speaker	Title	Time	Abstract	Material
Yihan Tao	Greetings and Introduction	5'		pdf
Alberto Accomazzi	BiblioPile: Building a Dataset to Support AI-enabled Bibliography Curation efforts	15'+5'	A well-established way to assess the scientific impact of an observational facility in astronomy is the quantitative analysis of the studies published in the literature which have made use of the data taken by the facility. A requirement of such analysis is the creation of bibliographies which annotate and link data products with the literature, thus providing a way to use bibliometrics as an impact measure for the underlying data. An automated assistant able to emulate some of the associated activities would provide a valuable contribution to the human effort involved. LLMs have shown flexibility in interpreting and classifying scientific articles which are the basis for this curation activity. They have also been successfully used for information extraction tasks, which would help identify the specific datasets mentioned in the papers. In this talk I will describe our effort to create the BiblioPile, a contributed dataset consisting of open access fulltext papers and annotated bibliography from institutions that maintain them in order to help train AI/ML bibliographic annotation pipelines.	pdf
Yan Shao	Generative Named Entity Normalization for Astronomical Facilities	15'+5'	Named entity normalization for astronomical facilities is crucial in the related academic research. Unlike the majority of the previous work, we model named entity normalization as a sequence generation problem via utilizing large language models, without assuming a comprehensive set of predefined normalized forms for any entities. Four entity normalization scenarios that are likely to occur in real-world application are discussed specifically, depending on whether the explicit normalization rules as well as the corresponding annotated instances are available. Moreover, we propose respective generative normalization methods and evaluate on datasets compiled from the standard telescope name lists maintained by the American Astronomical Society (AAS) and the Astrophysics Data System (ADS). The empirical findings demonstrate that the analytical, inductive, and generative capabilities of LLMs empower generative entity normalization to achieve commendable performances, even under very stringent conditions. The generative normalization effectively remedies the shortcomings of the retrieval-based methods.	pdf

Changed:

<
<

Kai Polsterer

15'+5'

pdf

>
>

Kai Polsterer

Spherinator & HIPSter & Jasmine

15'+5'

Simulations are the best, and often the only, approximation to experimental laboratories in Astrophysics. However, the complexity and richness of their outputs severely limits the interpretability of their predictions. We describe a new conceptual approach to obtaining useful scientific insights from a broad range of astrophysical simulations. These methods can be applied to state-of-the-art simulations and will be essential to automate the data exploration and analysis of the next-generation exascale simulations and the extreme data challenges they will present. Our concept is based on applying the latest advances in unsupervised deep learning algorithms to efficiently represent the multidimensional datasets produced by Astrophysics simulations and to learn compact but accurate representations of the data in a low-dimensional manifold that naturally describes the data in an optimal feature space. The data can seemingly be projected onto this latent space for interactive inspection, visual interpretation, and quantitative analysis, including the option of deriving symbolic expressions to build interpretable models. We present a working prototype of the pipeline using an autoencoder trained on galaxy images from SDSS (or equivalently simulated galaxies) as well as the Illustris simulations, to produce a natural ‘Hubble tuning fork’ similarity space that can be visualized interactively on the surface of a sphere by exploiting the power of HiPS tilings in AladinLite.

pdf

Panel + audience
(Yihan Tao, Kai Polstererk, Rafael Martinez Galarza, Alberto Accomazzi)

Panel-led discussion

30'

Seeding topics for discussion

1. How can state-of-the-art AI technologies, such as LLMs, fundation models and agents enhance the VO?

2. What are the potential applications of these AI technologies within the VO framework?

3. What are the best practices and strategies for integrating AI agents and models with VO tools and science platforms that can help user efficiently access to and analyse astronomical data? What are the challenges?

Moderator: Raffaele D'Abrusco, Notetaker: TBD, Etherpad link

Revision 32024-05-16 - RaffaeleDAbrusco

META TOPICPARENT	name="WebHome"

Knowledge Discovery

Time: Wednesday May 22, 2024 11:00-12:30 Australian Eastern Standard Time

Speaker	Title	Time	Abstract	Material
Yihan Tao	Greetings and Introduction	5'		pdf
Alberto Accomazzi	BiblioPile: Building a Dataset to Support AI-enabled Bibliography Curation efforts	15'+5'	A well-established way to assess the scientific impact of an observational facility in astronomy is the quantitative analysis of the studies published in the literature which have made use of the data taken by the facility. A requirement of such analysis is the creation of bibliographies which annotate and link data products with the literature, thus providing a way to use bibliometrics as an impact measure for the underlying data. An automated assistant able to emulate some of the associated activities would provide a valuable contribution to the human effort involved. LLMs have shown flexibility in interpreting and classifying scientific articles which are the basis for this curation activity. They have also been successfully used for information extraction tasks, which would help identify the specific datasets mentioned in the papers. In this talk I will describe our effort to create the BiblioPile, a contributed dataset consisting of open access fulltext papers and annotated bibliography from institutions that maintain them in order to help train AI/ML bibliographic annotation pipelines.	pdf
Yan Shao	Generative Named Entity Normalization for Astronomical Facilities	15'+5'	Named entity normalization for astronomical facilities is crucial in the related academic research. Unlike the majority of the previous work, we model named entity normalization as a sequence generation problem via utilizing large language models, without assuming a comprehensive set of predefined normalized forms for any entities. Four entity normalization scenarios that are likely to occur in real-world application are discussed specifically, depending on whether the explicit normalization rules as well as the corresponding annotated instances are available. Moreover, we propose respective generative normalization methods and evaluate on datasets compiled from the standard telescope name lists maintained by the American Astronomical Society (AAS) and the Astrophysics Data System (ADS). The empirical findings demonstrate that the analytical, inductive, and generative capabilities of LLMs empower generative entity normalization to achieve commendable performances, even under very stringent conditions. The generative normalization effectively remedies the shortcomings of the retrieval-based methods.	pdf
Kai Polsterer		15'+5'		pdf

Changed:

<
<

Panel + audience
(Yihan Tao, Kai Polstererk, Rafael Martinez Galarza)

Panel-led discussion

Seeding topics for discussion

1. How can state-of-the-art AI technologies, such as LLMs, fundation models and agents enhance the VO?

2. What are the potential applications of these AI technologies within the VO framework?

3. What are the best practices and strategies for integrating AI agents and models with VO tools and science platforms that can help user efficiently access to and analyse astronomical data? What are the challenges?

>
>

Panel + audience
(Yihan Tao, Kai Polstererk, Rafael Martinez Galarza, Alberto Accomazzi)

Panel-led discussion

30'

Seeding topics for discussion

1. How can state-of-the-art AI technologies, such as LLMs, fundation models and agents enhance the VO?

2. What are the potential applications of these AI technologies within the VO framework?

3. What are the best practices and strategies for integrating AI agents and models with VO tools and science platforms that can help user efficiently access to and analyse astronomical data? What are the challenges?

Moderator: Raffaele D'Abrusco, Notetaker: TBD, Etherpad link

Deleted:

<
<

Revision 22024-05-13 - RaffaeleDAbrusco

META TOPICPARENT	name="WebHome"

Knowledge Discovery

Time: Wednesday May 22, 2024 11:00-12:30 Australian Eastern Standard Time

Speaker	Title	Time	Abstract	Material
Yihan Tao	Greetings and Introduction	5'		pdf
Alberto Accomazzi	BiblioPile: Building a Dataset to Support AI-enabled Bibliography Curation efforts	15'+5'	A well-established way to assess the scientific impact of an observational facility in astronomy is the quantitative analysis of the studies published in the literature which have made use of the data taken by the facility. A requirement of such analysis is the creation of bibliographies which annotate and link data products with the literature, thus providing a way to use bibliometrics as an impact measure for the underlying data. An automated assistant able to emulate some of the associated activities would provide a valuable contribution to the human effort involved. LLMs have shown flexibility in interpreting and classifying scientific articles which are the basis for this curation activity. They have also been successfully used for information extraction tasks, which would help identify the specific datasets mentioned in the papers. In this talk I will describe our effort to create the BiblioPile, a contributed dataset consisting of open access fulltext papers and annotated bibliography from institutions that maintain them in order to help train AI/ML bibliographic annotation pipelines.	pdf

Added:

>
>

Yan Shao	Generative Named Entity Normalization for Astronomical Facilities	15'+5'	Named entity normalization for astronomical facilities is crucial in the related academic research. Unlike the majority of the previous work, we model named entity normalization as a sequence generation problem via utilizing large language models, without assuming a comprehensive set of predefined normalized forms for any entities. Four entity normalization scenarios that are likely to occur in real-world application are discussed specifically, depending on whether the explicit normalization rules as well as the corresponding annotated instances are available. Moreover, we propose respective generative normalization methods and evaluate on datasets compiled from the standard telescope name lists maintained by the American Astronomical Society (AAS) and the Astrophysics Data System (ADS). The empirical findings demonstrate that the analytical, inductive, and generative capabilities of LLMs empower generative entity normalization to achieve commendable performances, even under very stringent conditions. The generative normalization effectively remedies the shortcomings of the retrieval-based methods.	pdf
Kai Polsterer		15'+5'		pdf

Panel + audience
(Yihan Tao, Kai Polstererk, Rafael Martinez Galarza)

Panel-led discussion

Seeding topics for discussion

1. How can state-of-the-art AI technologies, such as LLMs, fundation models and agents enhance the VO?

2. What are the potential applications of these AI technologies within the VO framework?

3. What are the best practices and strategies for integrating AI agents and models with VO tools and science platforms that can help user efficiently access to and analyse astronomical data? What are the challenges?

Deleted:

<
<

Moderator: Raffaele D'Abrusco, Notetaker: TBD, Etherpad link

Revision 12024-05-09 - RaffaeleDAbrusco

META TOPICPARENT	name="WebHome"

Knowledge Discovery

Time: Wednesday May 22, 2024 11:00-12:30 Australian Eastern Standard Time

Speaker	Title	Time	Abstract	Material
Yihan Tao	Greetings and Introduction	5'		pdf
Alberto Accomazzi	BiblioPile: Building a Dataset to Support AI-enabled Bibliography Curation efforts	15'+5'	A well-established way to assess the scientific impact of an observational facility in astronomy is the quantitative analysis of the studies published in the literature which have made use of the data taken by the facility. A requirement of such analysis is the creation of bibliographies which annotate and link data products with the literature, thus providing a way to use bibliometrics as an impact measure for the underlying data. An automated assistant able to emulate some of the associated activities would provide a valuable contribution to the human effort involved. LLMs have shown flexibility in interpreting and classifying scientific articles which are the basis for this curation activity. They have also been successfully used for information extraction tasks, which would help identify the specific datasets mentioned in the papers. In this talk I will describe our effort to create the BiblioPile, a contributed dataset consisting of open access fulltext papers and annotated bibliography from institutions that maintain them in order to help train AI/ML bibliographic annotation pipelines.	pdf
Panel + audience (Yihan Tao, Kai Polstererk, Rafael Martinez Galarza)	Panel-led discussion		Seeding topics for discussion 1. How can state-of-the-art AI technologies, such as LLMs, fundation models and agents enhance the VO? 2. What are the potential applications of these AI technologies within the VO framework? 3. What are the best practices and strategies for integrating AI agents and models with VO tools and science platforms that can help user efficiently access to and analyse astronomical data? What are the challenges?

Moderator: Raffaele D'Abrusco, Notetaker: TBD, Etherpad link

Difference: InterOpMay2024KD (1 vs. 10)

Revision 102024-05-21 - KaiLarsPolsterer

Knowledge Discovery

Revision 92024-05-21 - AlbertoAccomazzi

Knowledge Discovery

Revision 82024-05-21 - RaffaeleDAbrusco

Knowledge Discovery

Revision 72024-05-21 - YihanTao

Knowledge Discovery

Revision 62024-05-20 - KaiLarsPolsterer

Knowledge Discovery

Revision 52024-05-20 - RaffaeleDAbrusco

Knowledge Discovery

Revision 42024-05-20 - KaiLarsPolsterer

Knowledge Discovery

Revision 32024-05-16 - RaffaeleDAbrusco

Knowledge Discovery

Revision 22024-05-13 - RaffaeleDAbrusco

Knowledge Discovery

Revision 12024-05-09 - RaffaeleDAbrusco

Knowledge Discovery