01/08/2026

AI-enabled data abstraction transforms Tempus data

Maria Berezina, PhD, VP of Real-World Data Analytics, explains how Tempus integrates advanced AI to augment data abstraction, increasing the speed and scale of data structuring to unlock novel insights from one of the world’s largest multimodal libraries.
Authors Maria A. Berezina, PhD
VP, Real-World-Data Analytics, Tempus



The adoption of widespread electronic health records (EHRs) unlocked the potential of real-world data (RWD), offering researchers the possibility to analyze routine clinical care at a massive scale. However, key clinical concepts are often buried in unstructured data, requiring a time-consuming data abstraction process to make it useful for research. At Tempus, we are pioneering an approach that builds on years of human-led data abstraction expertise, millions of labeled data points, and the power of artificial intelligence (AI) to dramatically accelerate this process. 

This hybrid approach is doing more than just increasing speed and scale; it’s unlocking the ability to answer novel research questions that were previously out of reach. Our AI-enabled approach is transforming the landscape for life science researchers, and I’m excited to share how we are making it a reality.

Our human-centric foundation for AI-enabled abstraction

Historically, data abstraction within the industry has been a manual and human-centric endeavor. Trained human abstractors extract clinical data points from unstructured notes by following a process that includes comprehensive abstraction guidelines, standardization against medical guidelines and terminologies, and robust quality control. This approach, while ensuring rigor, is also time-intensive, has limitations of scale, and is subject to human error.

When we bring AI into the loop, it serves to augment and enable this human-led process. The AI capabilities we have developed are fundamentally rooted in the years of meticulous human abstraction and the expertise our abstraction personnel have built. This allows us to create large and comprehensive validation datasets. We firmly believe in quality, and so all of our AI tooling is validated against these expert-created labels.

This philosophy guides our entire AI development and deployment lifecycle. Initially, our expert human abstractors create the high-quality, labeled data that our AI models are trained on, forming the foundation of our model’s intelligence. During development, we benchmark the model’s performance directly against these human experts. The model’s output is rigorously evaluated to ensure its precision and recall are comparable to the established performance of our trained abstractors. Finally, once an AI model is released, our teams continue to assess its performance on a regular cadence to ensure there is no drift and that the quality of the output remains consistently high. This “human-in-the-loop” framework ensures that even as we scale, our commitment to data quality and accuracy remains our central focus.

During development, we benchmark the model’s performance directly against these human experts. The model’s output is rigorously evaluated to ensure its precision and recall are comparable to the established performance of our trained abstractors.

Maria A. Berezina, PhD

From human expertise to high-fidelity AI-enabled abstraction

The quality of our AI-enabled abstraction approach is a direct result of our years of experience in manual data abstraction. From the start of our abstraction program, we have leveraged a team with deep, relevant expertise—including personnel with extensive experience in data abstraction and management for clinical trials and clinical registries—to develop and refine robust guidelines for capturing hundreds of clinical fields. Our abstraction is refined through a process that involves a feedback loop where our abstractors surface questions about nuances and anomalies from real-world cases, which are then resolved by a panel of oncology nurses, data managers, and other clinical experts as necessary. 

Over time, our guidelines have become incredibly robust, with the various nuances of real-world clinical notes and patterns integrated directly into them. Our AI-enabled abstraction is rooted in these guidelines and is measured against quality benchmarks we’ve established. The reason our solution can be so robust is that we’ve built our models on the foundation of years of human work and abstraction of hundreds of thousands of real-world clinical cases. Our prompts and models lean into that expertise, which is why they can capture clinical nuance. This foundational expertise gives us a dramatic head start. We know what language clinicians use, what values to expect, and how to optimize our AI prompts to extract data with the highest possible fidelity.

“The reason our solution can be so robust is that we’ve built our models on the foundation of years of human work and abstraction of hundreds of thousands of real-world clinical cases.”

Maria A. Berezina, PhD

Unlocking novel research by finding the needle in the haystack

While our standard abstraction process captures the vast majority of data elements pertinent to a patient’s cancer journey, some research questions require digging deeper for rare or highly specific clinical features. This is where AI-enabled abstraction becomes a game-changer.

A powerful example is a project that required us to identify patients with hemophagocytic lymphohistiocytosis (HLH), a rare and serious secondary condition that can occur after certain cancer treatments. Because this condition was only recently recognized, locating this patient population without a substantial investment of time and manual effort would have been impossible, making this population effectively inaccessible. Manually searching for these patients across our entire database would have been like finding a needle in a haystack.

Using AI, we were able to scan thousands of de-identified records in a fraction of the time required for human screening, allowing us to identify a smaller, enriched cohort of patients who were more likely to have this rare condition. This narrowed-down population could then be efficiently reviewed using our traditional human abstraction methods.

For me, this was a pivotal moment that truly demonstrated the power of AI. Finding these patients allows us to conduct research to better understand their characteristics and work to prevent the condition from emerging. This capability unlocks new possibilities for research, making it feasible to study populations that were once too difficult to isolate and, ultimately, to improve patient outcomes.

This capability unlocks new possibilities for research, making it feasible to study populations that were once too difficult to isolate and, ultimately, to improve patient outcomes.”

Maria A. Berezina, PhD

Accelerating research at an unprecedented scale

For our life science partners, the most direct benefit of AI-enabled abstraction is the dramatic increase in project velocity. The speed and scale can transform the pace of life science R&D.

In a recent pilot, our AI-enabled abstraction processed 60,000 patient records in just a few days—a task that would have previously taken a large team of our abstractors months to complete. This unprecedented speed means that researchers can get to insights faster than ever before.

This acceleration does not replace our human experts. Instead, it allows us to refocus their expertise on the most complex and critical data elements, creating a hybrid model that is more powerful than either component could be alone. This fit-for-purpose approach ensures we use the right tool for the right task. For foundational real-world evidence variables like line of therapy or complex patient eligibility criteria, AI can be used to accelerate the pre-processing of simpler, related features. For example, we can speed through 90% of features that are easily extractable and focus our human experts on the 10% that are most critical and complex. This allows us to devote our human capital where it matters most, ensuring that even for high-stakes regulatory use cases, the process is both accelerated and maintains the highest level of confidence and reliability.

In a recent pilot, our AI-enabled abstraction processed 60,000 patient records in just a few days—a task that would have previously taken a large team of our abstractors months to complete.”

Maria A. Berezina, PhD

A look ahead: The future of curated data

Looking ahead, our investment in AI-enabled abstraction will continue to expand the scope and availability of data for our partners. As we scale our availability of de-identified patient records across Tempus, currently exceeding 45 million records across all our products and services, we will leverage these capabilities to make these records usable for research. This approach will substantially increase the volume of high-quality, de-identified data available through the Tempus Lens platform. As we continue to refine these tools, the time from data acquisition to research-ready insights will only get shorter, empowering our partners to accelerate the next wave of therapeutic breakthroughs.

 

To learn more about Tempus’ data solutions, click here or contact us.
Forward looking statements
This article may contain forward-looking statements within the meaning of Section 27A of the Securities Act of 1933, as amended (the “Securities Act”), and Section 21E of the Securities Exchange Act of 1934, as amended, about Tempus and Tempus’ industry that involve substantial risks and uncertainties. All statements other than statements of historical facts contained in this article are forward-looking statements, including, but not limited to, statements regarding the expected outcomes and benefits of AI-enabled abstraction. In some cases, you can identify forward-looking statements because they contain words such as “anticipate,” “believe,” “contemplate,” “continue,” “could,” “estimate,” “expect,” “going to,” “intend,” “may,” “plan,” “potential,” “predict,” “project,” “should,” “target,” “will,” or “would” or the negative of these words or other similar terms or expressions. Tempus cautions you that the foregoing may not include all of the forward-looking statements made in this article. 
You should not rely on forward-looking statements as predictions of future events. Tempus has based the forward-looking statements contained in this article primarily on its current expectations and projections about future events and trends that it believes may affect Tempus’ business, financial condition, results of operations and prospects. These forward-looking statements are subject to risks and uncertainties related to: Tempus’ financial performance; the ability to attract and retain customers and partners; managing Tempus’ growth and future expenses; competition and new market entrants; compliance with new laws, regulations and executive actions, including any evolving regulations in the artificial intelligence space; the ability to maintain, protect and enhance Tempus’ intellectual property; the ability to attract and retain qualified team members and key personnel; the ability to repay or refinance outstanding debt, or to access additional financing; future acquisitions, divestitures or investments; the potential adverse impact of climate change, natural disasters, health epidemics, macroeconomic conditions, and war or other armed conflict, as well as risks, uncertainties, and other factors described in the section titled “Risk Factors” in Tempus’ Quarterly Report on Form 10-K for the fiscal year ended December 31, 2024 filed with the Securities and Exchange Commission (“SEC”) as well as in other filings Tempus may make with the SEC in the future. In addition, any forward-looking statements which may be contained in this article are based on assumptions that Tempus believes to be reasonable as of this date. Tempus undertakes no obligation to update any forward-looking statements to reflect events or circumstances after the date of this article or to reflect new information or the occurrence of unanticipated events, except as required by law.

Related Content

View more