Simplifying complex queries with Lens Cohort Builder

Jan 06, 2025
Generative AI
Arpita SahaVice President, Generative AI
Nick PojmanGroup Product Manager
Samuel HeilbronerStaff Machine Learning Scientist
Matthew WestMachine Learning Scientist, GenAI
Sai PrabhakarSenior Machine Learning Scientist
Jesus PedrosaEngineering Lead

The Problem: Querying a Large Oncology Database with Generative AI

 

Querying large biomedical databases presents significant challenges due to the complexity of ontologies and schemas. Tempus maintains a vast cohort of multimodal oncology records for research purposes, but navigating this data can be daunting — even for experienced users. Tempus Lens, a software-as-a-service platform, simplifies this process through a drag-and-drop interface. Biomedical concepts are represented by filters grouped into pills, which users can easily manipulate. However, scaling this solution to less experienced users has been a challenge. Generative AI shows substantial promise in addressing these challenges by facilitating natural language querying across multiple domains. Lens Cohort Builder extends the functionality of Tempus Lens by enabling users to interact with the data using only natural language prompts, abstracting the complexities of biomedical ontologies with the help of large language models (LLMs).

 

Lens Cohort Builder: Methods and Architecture

 

Each filter in Lens Cohort Builder is tied to a specific LLM call with a custom-designed prompt. These prompts ensure that the most relevant matches are returned for each associated filter concept. Filters are processed in parallel, and a subsequent LLM call groups these filters into logical relationships based on the user’s input. The resulting query is populated in the user interface, where users can choose to apply or modify the suggested query pills.

 

Example Workflow

 

  • User Input: Researchers input natural language text, such as “Find patients with lung cancer who received chemotherapy as a first-line treatment.”
  • LLM Mapping: The system maps this text to various filters, such as "Primary Diagnosis: Lung Cancer" and "Line of Therapy: Chemotherapy (First Line)."
  • Query Assembly: Filters are grouped into logical relationships and displayed as pills in the UI for user approval or refinement.

 

 

Testing and Results

 

Beta testing was conducted with internal users to evaluate the tool’s accuracy and usability. Researchers and product managers generated queries and assessed Lens Cohort Builder’s responses based on their subject matter expertise. Additionally, users provided qualitative feedback through surveys.

 

Key Findings:

 

  • Utility: A total of 33 users evaluated 1,916 queries, with 63.3% deemed accurate or mostly accurate and 36.7% rated as inaccurate or mostly inaccurate.
  • Unknown Scope: Approximately 320 queries were identified as outside the tool’s intended scope.

 

Filter-Specific Performance:

 

table {
border-collapse: collapse;
width: 100%;
margin: 20px 0;
border: 1px solid #ccc;
border-radius: 8px;
overflow: hidden;
}
th {
background-color: #f2f2f2;
color: #000;
font-weight: bold;
padding: 10px;
text-align: left;
border-bottom: 1px solid #ccc;
}
td {
padding: 10px;
text-align: left;
border-bottom: 1px solid #ccc;
}
tr:last-child td {
border-bottom: none;
}

 

Filter NamePrecisionRecallF1Accuracy
Overall0.7750.820.7970.663
Primary Diagnosis0.8860.9860.9330.875
Somatic Variant Genes0.7740.960.8570.75
DNA Modality0.4580.880.6030.431
Biopsy Modality0.6250.9090.7410.588
RNA Modality0.8260.8640.8440.731
Line of Therapy10.7140.8330.714
Drug Class10.750.8570.75
Tumor Stage0.6670.8570.750.6

 

The results indicate strong performance for commonly used filters, particularly "Primary Diagnosis," "Somatic Variant Genes," and "Drug Class."

 

Impact

 

New users who may be unfamiliar with Tempus’ complex data model can now make use of generative AI for more accessible and efficient cohort development.

 

Next Steps

 

By continuing to refine accuracy and usability, Tempus aims to unlock even greater value from its data, driving innovation in biomedical research and personalized medicine.