NASA’s Interagency Implementation and Advanced Concepts Team (IMPACT) has been actively engaging in collaborations with private, non-federal partners through Space Act Agreements. One of the key collaborations that has emerged is with International Business Machines (IBM), resulting in the development of INDUS, a comprehensive suite of large language models (LLMs) tailored for various scientific domains. This collaborative effort has paved the way for significant advancements in the field of scientific research.
The INDUS suite comprises encoders and sentence transformers that convert natural language text into numeric coding for processing by the LLM. These models were trained on a vast corpus encompassing astrophysics, planetary science, Earth science, heliophysics, biological, and physical sciences data. The custom tokenizer developed by the IMPACT-IBM team has significantly enhanced the model’s efficiency by recognizing scientific terms and specific vocabulary unique to the scientific domains utilized for training.
The IMPACT-IBM collaboration has demonstrated superior performance of INDUS over open, non-domain specific LLMs in various benchmark tests, including biomedical tasks, scientific question-answering, and Earth science entity recognition. INDUS excels in processing researcher questions, retrieving relevant documents, and generating answers by incorporating domain-specific vocabulary and diverse linguistic tasks. The development of smaller, faster versions of the models further enhances the applicability of INDUS for latency-sensitive applications.
INDUS has been seamlessly integrated into various NASA projects, showcasing its versatility and utility in enhancing scientific research endeavors. From optimizing the search capabilities of the Open Science Data Repository (OSDR) to categorizing publications citing GES-DISC data at the NASA Goddard Earth Sciences Data and Information Services Center (GES-DISC), INDUS has demonstrated its value in improving data retrieval, knowledge graph integration, and dataset recommendation systems.
The incorporation of INDUS into existing applications, such as NASA’s Science Discovery Engine (SDE), has proven to significantly enhance the accuracy and relevancy of search results. By providing researchers with improved access to specialized knowledge, INDUS facilitates the understanding of complex scientific concepts, extraction of relevant information, and exploration of new research directions. The models’ availability on Hugging Face reinforces NASA and IBM’s commitment to open and transparent artificial intelligence, benefitting the scientific community.
NASA’s collaboration with IBM on the development of INDUS has marked a significant milestone in advancing scientific research capabilities. The suite of LLMs not only demonstrates superior performance in various scientific domains but also enhances efficiency, accuracy, and accessibility of information for researchers. As INDUS continues to evolve and adapt to diverse science domain applications, its impact on scientific communication and knowledge discovery is poised to shape the future of research endeavors.
The strange and elusive domain of quantum mechanics, characterized by its counterintuitive principles, often raises…
Water sources around the globe face increasing threats from pollution, particularly from heavy metals like…
In recent years, the prevalence of plastics in our environment has become alarmingly evident. Microscopic…
The U.S. Geological Survey (USGS) has unveiled its groundbreaking nationwide map detailing landslide susceptibility, revealing…
The rapid rise of large language models (LLMs) has significantly transformed various aspects of our…
The vast expanse of space offers a daunting challenge when it comes to astronomical observations,…
This website uses cookies.