12/17/2023

Large Language Models with Retrieval-Augmented Generation for Zero-Shot Disease Phenotyping

NeurIPS 2023 PRESENTATION
Authors Will E. Thompson, David M. Vidmar, Jessica K. De Freitas, Gabriel Altay, Kabir Manghnani, Andrew C. Nelsen*, Kellie Morland*, John M. Pfeifer, Brandon K. Fornwalt, Ruijun Chen, Martin C. Stumpe, Riccardo Miotto

Abstract

Identifying disease phenotypes from electronic health records (EHRs) is critical for numerous secondary uses. Manually encoding physician knowledge into rules is particularly challenging for rare diseases due to inadequate EHR coding, necessitating review of clinical notes. Large language models (LLMs) offer promise in text understanding but may not efficiently handle real-world clinical documentation. We propose a zero-shot LLM-based method enriched by retrieval-augmented generation and MapReduce, which pre-identifies disease-related text snippets to be used in parallel as queries for the LLM to establish diagnosis. We show that this method as applied to pulmonary hypertension (PH), a rare disease characterized by elevated arterial pressures in the lungs, significantly outperforms physician logic rules (F1 score of 0.62 vs. 0.75). This method has the potential to enhance rare disease cohort identification, expanding the scope of robust clinical research and care gap identification.

VIEW THE POSTER

Related Content

View more
  • post image
    08/29/2025

    Precision Medicine 2.0: The operating system for oncology R&D

    This white paper addresses the persistent challenges in oncology R&D, demonstrating how precision medicine has evolved into a new paradigm built on systems-level understanding and AI-driven insights. Download for a comprehensive overview of Precision Medicine 2.0, its core scientific pillars, enabling technologies, and how R&D organizations can leverage this integrated approach to optimize their cancer research.

    Read more
  • post image
    08/15/2025

    Tempus’ data solutions for commercialization

    Learn how our multimodal real-world data can help quantify new market opportunities, build your value story, and inform your market access strategy.

    Read more
  • post image
    08/15/2025

    Tempus’ data solutions for clinical development

    Learn how our multimodal real-world data can help you optimize trial design, mitigate risk, and inform your companion diagnostic strategy.

    Read more