The transition from oncology research as an empirical discipline to a data-driven computational science represents the most significant shift in precision medicine since the completion of the Human Genome Project. The reality is that cancer biology — with tumor microenvironment (TME) heterogeneity, genomic instability, and clonal evolution — is too complex for traditional analytical methods. The primary bottleneck to therapeutic innovation is no longer the scarcity of data, but rather the computational power required to extract biological meaning from it. This will only become more of a prohibitive factor as the sheer volume of clinical and biological information continues to grow.
At Tempus, we’ve engineered a vertically integrated ecosystem that pairs our library of over 450 petabytes of multimodal clinical and molecular data with state-of-the-art NVIDIA GPU and CPU architectures. This integration is not merely a logistical convenience; it is a fundamental requirement for the development of next-generation biological foundation models.
The factorial problem of oncology research
Modern drug discovery is hindered by a combinatorial explosion. To identify a successful lead compound, researchers must navigate the intersection of approximately 20,000 protein-coding genes, hundreds of distinct cancer subtypes, and an astronomical number of potential drug-to-target interactions. Accounting for disease variability requires robust datasets and computational power.
However, dry lab environments often lack the scale to process these variables simultaneously. Consequently, many promising candidates fall victim to the translational gap; biological models are often trained on datasets that lack sufficient multimodal depth or the computational headroom to account for real-world heterogeneity.
To increase the probability of technical success for an asset, scientists need data that is comprehensive, representative, and multimodal. The challenge of large datasets is best addressed by bringing the computational infrastructure to the data source rather than moving the data to the compute. This paradigm shift is essential for training deep learning models, particularly foundation models that require constant access to the entire dataset for iterative weight updates. Traditional cloud-based analytics services that adhere to the ETL (Extract, Transform, Load) pipeline or rely on predictive APIs often fail at this scale because the architecture assumes data can be easily ingested and transformed on the fly.
Computational power for the new age of medicine
Foundation models are a driving force behind how AI is transforming healthcare. Unlike traditional machine learning (ML) models that are trained for specific tasks, foundation models are large-scale, deep learning models pre-trained on extensive, diverse datasets and can be fine-tuned for a wide variety of downstream tasks, ranging from disease and biomarker detection to therapeutic response prediction.
The efficacy of these models is directly proportional to the diversity of the training data and the floating-point operations (FLOPs) available during training.1 Larger and more diverse datasets contribute to the predictive power of downstream tasks, and performance similarly improves with additional compute. Having access to both enables the creation of novel analyses, such as digital twins and in silico trial simulations, that would otherwise be computationally impossible.
Putting it into practice: AI-enabled research applications
Tempus is at the forefront of this new wave of development, leveraging robust models to derive valuable insights from standard modalities, like H&E slides. Paige Predict does just that, predicting the status of over 1,600 biomarkers to inform testing prioritization, improve diagnostic yield, and accelerate R&D to identify novel digital biomarkers.2 Paige Predict builds upon Virchow 2, a state-of-the-art pathology foundation model pre-trained on three million digital pathology slides to establish comprehensive representation of histopathologic phenotypes and enable generalization across cancer types in downstream models. It’s models like these that set a new standard for computational biology.
Tempus’ approach is to treat a patient’s cancer journey as a temporal and multimodal story, building a unified representation, or “biological engine,” that moves beyond single-purpose traditional AI models. This engine integrates core signals from tumor tissue biology (pathology and genomics) and the patient journey (longitudinal EHR data and clinical notes). By starting with this pre-trained understanding of biology, researchers can avoid relearning the fundamentals from scratch for every new trial or question, leading to higher model performance, faster development, and lower overall costs. The proprietary combination of data scale, multimodality, and longitudinality, matched with the necessary compute power, is critical for capturing predictive signals and the full complexity of cancer.
Researchers can leverage Tempus’ compute infrastructure and multimodal data repository via the Lens Platform. Workspaces — a dedicated cloud-based analytical environment within Lens — provides multiple machine specifications to support diverse research aims, ranging from small-scale analysis backed by high-performance CPUs to the massive parallel processing required for foundation model training with GPUs. This allows biopharma partners to operate within a secure sandbox where massive-scale computation occurs inches away from the raw data, rather than miles.
Integrated intelligence for the future of healthcare
The value of Tempus’ massive computing power is not found in the raw numbers of petabytes or teraFLOPs alone, but in the seamless integration of these assets into a unified biological intelligence platform. By resolving the physical constraints of data gravity and the mathematical complexity of the factorial problem in oncology, Tempus has created a unique ecosystem where AI can be trained, validated, and deployed at the speed of clinical need. Intelligent outputs are accessible at every stage of development, from designing better clinical trials to deploying AI across EHR systems to streamline workflows.
The future of oncology does not belong to those who simply possess the most data, but to those who can process it at the speed of biological discovery. By integrating 450+ petabytes of multimodal data with some of the world’s most advanced computing architectures, Tempus is providing the industry with the operating system required to solve the factorial problem of cancer and deliver on the promise of truly personalized medicine.
References
-
Chen, H., Venkatesh, M.S., Gόmez Ortega, J. et al. Scaling and quantization of large-scale foundation model enables resource-efficient predictions in network biology. Nat Comput Sci (2026). https://doi.org/10.1038/s43588-026-00972-4
-
All Paige Predict features other than those available for clinical use with xT CDx are for Research Use Only (RUO) and are not for use in diagnostic procedures. Clinical use of Paige Predict at the Tempus Chicago lab has been validated as a laboratory developed test and is performed for eligible xT QNS cases.
| To learn more about Tempus’ multimodal data and computing infrastructure, explore the Lens Library. |