Background – Identifying the set of patients with a particular disease diagnosis across electronic health records (EHRs), referred to as a phenotype, is an important step in clinical research and applications. However, this task is often challenging, where incomplete data can render definitive classifications impossible. We propose a probabilistic approach to phenotyping based on Bayesian inference and without the need for gold-standard labels. In this paper, we develop multiple heuristic “labeling functions’’ (LFs) for 4 diseases across de-identified EHR data and aggregate their votes through a majority vote approach (MV), a popular open-source approach (Snorkel OSS), and our proposed probabilistic approach (LEVI). We compare the resulting phenotypes to those built using expert-curated logic from the literature, as well as an off-the-shelf natural language processing pipeline (Medspacy), using a curated sample of physician-reviewed labels for evaluation.
Results – Phenotypes built using LFs perform better than off-the-shelf alternatives on classification performance (F1 scores of 0.79–0.82 vs. expert-logic: 0.68, Medspacy: 0.55). Compared to output scores from Snorkel OSS, LEVI provides better probabilistic performance (expected calibration error of 0.04 vs. 0.12), ROC AUC estimates (interval score [loss] of 0.03 vs. 0.10), and operating point selection (equal-cost net benefit of 0.18 vs. 0.15).
Conclusions – For challenging disease states, phenotyping using probabilities rather than binary classification can lead to improved and more personalized downstream decision-making. Probabilistic phenotypes built using LEVI exhibit low calibration error without the need for labels, allowing for better risk-benefit tradeoffs.
VIEW THE PUBLICATION