INTRODUCING TEMPUS NEXT: AI-ENABLED CARE PATHWAY INTELLIGENCE /// EXPLORE NOW INTRODUCING TEMPUS NEXT: AI-ENABLED CARE PATHWAY INTELLIGENCE ///
03/25/2024

H&E-Based Deep Learning Model Predicts Immune Phenotypes in NSCLC

USCAP 2024 Annual Meeting PRESENTATION
Authors Josh Och, M.S., Bolesław Osinski, Ph.D., Sina Zomorrodian, Rohan Joshi, Nike Beaubier, Martin C. Stumpe

Background – Tumor-infiltrating lymphocytes (TIL) are a biomarker for response to immune checkpoint inhibitor (ICI) therapy. However, visually identifying TILs from hematoxylin and eosin (H&E) stained whole-slide images (WSIs) is expensive and time-consuming. Further, a lack of consensus on TIL scoring criteria can result in high levels of pathologist discordance. Here, we use an imaging-based deep learning model to classify H&E WSIs of non-small cell lung cancer (NSCLC) tumors into inflamed (TIL infiltrated) versus non-inflamed immune phenotypes. 

Design – H&E stained WSIs (scanned on Leica Aperio GT450) from primary site, resected NSCLC tumors enriched for class balance between inflamed and non-inflamed (n=169, Table I) were assessed by 4 independent pathologists who assigned one of the following labels: inflamed, excluded (stromal TIL infiltrated) or desert (no infiltration). WSIs were randomly split (50:50) into training and testing sets. For model training, one pathologist was held out; each WSI was assigned a ground truth label if 2/3 of remaining pathologists agreed. Previously developed deep learning models were used to count the number of lymphocytes in tumor and stroma regions in 1 mm2 tiles. WSIs were called “inflamed” if enough tiles (slide threshold) had a sufficient number of lymphocytes within tumor regions (tile threshold). Tile and slide-level thresholds were tuned to maximize the F1 score for separation of inflamed and non-inflamed (either excluded or desert) cases in the training set. Model performance was assessed on the independent held-out test set and compared to calls from the held-out pathologist. This process was repeated 4 times, holding out each pathologist.

Results – Overall, our deep learning model achieved a mean of 77.5% accuracy (95% CI 73.3% – 81.7%), which was indistinguishable from that of the held-out pathologists (mean 80.3%, 95% CI 69.0% – 91.6%). Cohen’s kappa coefficient for the model (mean 45.4%, 95% CI 34.9% – 55.9%) was also indistinguishable from that of the held-out pathologists (mean 49.2%, 95% CI 25.3% – 73.0%). A representative comparison of the model to a held out pathologist is shown in Fig.1.

Conclusions – Our deep learning model performed as well as a held-out pathologist for separating inflamed vs. non-inflamed samples. Future work includes extending our model to predicting excluded vs. desert phenotypes, identifying tertiary lymphoid structures, and validating that model predictions are relevant for therapy response. 

Figure 1

Table 1

Characteristic Value Samples Percent Inflamed (defined by majority of three pathologists) Chi-square p-value
Stage Stage 1 48 77% 0.018
Stage 2 54 85%
Stage 3 50 64%
Stage 4 17 53%
Grade Rollup Low Grade 105 70% 0.204
High Grade 64 80%
Histology Adenocarcinoma 86 71% 0.681
Acinar Cell Carcinoma 50 74%
Mucinous Adenocarcinoma 33 79%

VIEW THE PUBLICATION

VIEW THE POSTER