With the release of PubMiner AI, Graphwise aims to help researchers in the biomedical domain overcome the challenges in automatic knowledge extraction from large volumes of scientific publications.
Introduction
Research published in academic journals plays a crucial role in improving drug discovery by revealing new biological targets, mechanisms, and treatment strategies. To effectively tap into this wealth of information, various AI technologies can sift through large amounts of literature to uncover key insights. This helps researchers identify potential drug targets and innovative solutions more easily, fostering collaboration and speeding up the drug development process.
However, many biomedical researchers lack the expertise to use these advanced data processing techniques. Instead, they often depend on skilled data scientists and engineers who can create automated systems to interpret complex scientific data. This is where PubMiner AI comes to help such interdisciplinary teams of biomedical researchers and data scientists in their journey to knowledge extraction.
What is PubMiner AI?
PubMiner AI is a scalable, AI-enabled workflow designed to facilitate the extraction of valuable information from extensive scientific publications. It uses the Retrieval Augmented Generation (RAG) approach, with a structured knowledge graph in the retrieval step and is hosted on the Databricks platform which provides smooth integration of processing resources on the cloud. By pulling data from interlinked sources like PubMed, SemMedDB, and Ontotext’s LinkedLifeData (LLD) Inventory datasets, PubMiner AI can focus only on the most relevant biomedical information.
This data is then processed by a large language model (LLM) and the results are interlinked with the LLD Inventory datasets to create a knowledge graph that represents the potentially new findings of scientific interest. This method follows principles that ensure data is Findable, Accessible, Interoperable, and Reusable (FAIR), thus allowing researchers to easily integrate this knowledge into other information systems. Through AI-powered insights, graphical representations, and customizable queries, they can leverage this approach to transform vast amounts of medical data into actionable knowledge.
Using PubMiner AI as a blueprint enables any skilled data scientist with basic knowledge of semantic technologies (knowledge graphs, RDF, SPARQL, ontologies, and semantic models) to build a targeted workflow that can extract intricate relations between biomedical entities from scientific literature. The workflow allows the identification of biomedical concepts and the relations between them. It also facilitates retrieval from a highly targeted subset of documents containing relevant information for the particular use case. Finally, it enables building a subgraph representing the extracted knowledge, normalized to reference data sets.
PubMiner AI key features
PubMiner AI is aimed at biomedical researchers, pharmaceutical companies, Healthcare professionals, and data scientists looking to integrate AI with knowledge graphs for enhanced biomedical literature analysis and knowledge discovery. It offers a comprehensive suite of features designed to streamline research and discovery. Some of its key features include:
- Integrated Data Sources: Combines data from PubMed, SemMedDB, and Ontotext’s LinkedLifeData Inventory powered by GraphDB into a highly scalable biomedical knowledge base.
- AI-Powered Knowledge Creation: Uses large language models to analyze cross-referenced biomedical data and generate in-depth insights.
- Knowledge Graph Generation: Automatically creates knowledge graphs to visualize and explore relationships between the biomedical entities in scope: diseases and genes.
- GraphDB Integration: Leverages GraphDB to enhance data structuring and querying, facilitating more complex insights from interconnected data.
- Customizable Data Subsets: Allows users to filter and focus on specific areas of interest for personalized knowledge discovery.
- Automated Report Generation: Summarizes research findings and trends into comprehensive, digestible reports.
- Enhanced Query Capabilities: Natural language queries enable intuitive, AI-driven exploration of large biomedical datasets.
Main use cases
PubMiner AI empowers users across multiple domains to make faster, data-informed decisions by delivering targeted insights from complex biomedical data. It is ideally suited for a wide range of use cases such as:
- Medical Research: Conduct thorough literature reviews, discover new relationships, and visualize findings with knowledge graphs.
- Pharmaceutical Development: Identify emerging biomarkers, drug interactions, and treatment trends to accelerate drug discovery and clinical research.
- Healthcare Analytics: Use AI-powered insights and knowledge graphs to uncover new relationships between diseases, treatments, and outcomes.
PubMiner AI has already been employed to discover novel biomarkers and susceptibility genes for neurodegenerative diseases such as Amyotrophic Lateral Sclerosis (ALS). This case study is aligned with the goals of the HEREREDITARY project, which aims to substantially change how diseases are detected, treatment outcomes are predicted, and medical insights are gained.
To wrap it up
PubMiner AI transforms complex biomedical data into actionable insights, making it easier for researchers, healthcare providers, and pharmaceutical companies to extract valuable information from scientific literature. By integrating diverse data sources and enabling intuitive, natural language queries, it provides users with a customizable and streamlined approach to knowledge discovery.
Its automated knowledge graph generation as well as exploration and visualization tools facilitate a deeper understanding of relationships between diseases, genes, and biomarkers, supporting advances in research, personalized medicine, and drug development.
Does this resemble your use case? Visit Databricks Marketplace to install PubMiner AI and access the knowledge hidden in scientific publications!
PubMiner AI has been partially funded by the European Union’s Horizon Europe research and innovation programme under grant agreement No 101137074 — HEREDITARY (HetERogeneous sEmantic Data integratIon for the guT-bRain interplay) project . Views and opinions expressed are however those of the author only and do not necessarily reflect those of the European Union. Neither the European Union nor the granting authority can be held responsible for them.
Originally published at https://www.ontotext.com on November 22, 2024.