Background – Laboratory data in electronic health records (EHRs) is an effective source of information to characterize patient populations, inform accurate diagnostics and treatment decisions, and fuel research studies. However, despite their value, laboratory values are underutilized due to high levels of missingness. Existing imputation methods fall short, as they do not fully leverage patient clinical histories and are commonly not scalable to the large number of tests available in real-world data (RWD).
Methods – To address these shortcomings, we present Laboratory Imputation Framework for EHRs (LIFE), a self-supervised learning framework based on multi-head attention that is trained to impute any laboratory test value at any point in time in the patient’s journey using their complete EHRs. This architecture (1) eliminates the need to train a different model for each laboratory test by jointly modeling all laboratory data of interest; and (2) better clinically contextualizes the predictions by leveraging additional EHR variables, such as diagnosis, medications, and discrete laboratory results.
Results – We validate our framework using a large-scale, real-world dataset encompassing over 1 million oncology patients. Our results demonstrate that LIFE obtains superior or equivalent results compared to state-of-the-art baseline methods in 23 out of 25 evaluated laboratory tests and better enhances a downstream adverse event detection task in 7 out of 9 cases.
Conclusions – LIFE shows promise in accurately estimating missing laboratory values and enhancing the utilization of large-scale RWD in healthcare. This advancement could lead to better clinical models, more informed decision-making and improved patient outcomes.
Plain language summary
Electronic health records (EHRs) contain laboratory test results that are crucial for modeling and analyzing patient health and outcomes. However, many laboratory test results are often missing, which limits their usefulness. Current automated methods to fill these gaps are not very effective because they generally do not utilize all available patient clinical information and cannot handle a wide variety of tests. To address this issue, we present Laboratory Imputation Framework for EHRs (LIFE), a model that predicts missing laboratory results at any point in time by analyzing a patient’s entire health record, including diagnoses and medications. Tested on data from over a million cancer patients, LIFE outperformed other methods in predicting laboratory results and improved the detection of several clinical adverse events. This tool could lead to better clinical models, potentially enhancing healthcare decisions and improving patient outcomes.
VIEW THE PUBLICATION