Authors
Bo Osinski, Qiyuan Hu, Sun Hae Hong, Kunal Nagpal, Ben Terdich, Yoni Muller, Arlen Brickman, Nike Beaubier
Background
Accurate quantification of tumor-infiltrating lymphocytes (TIL) can predict patient response to immune checkpoint inhibitor (ICI) therapy in non-small cell lung cancer (NSCLC). We developed an AI model, consisting of 2 sub-modules (lymphocyte detection, tumor & stromal region segmentation), to predict the density of lymphocytes within NSCLC tumors. Manual TIL scoring methods are inconsistent and subjective, so we validate each module with high quality ground truth (GT), using IHC-derived labels for lymphocytes and a consensus of 4 pathologists for regions.
Design
For the lymphocyte model, GT labels were derived from IHC. To achieve this, 280 slides were stained for HE, scanned, de-stained with xylene and re-stained for CD3/CD20 (T/B lymphocytes), and scanned again. In QuPath, pathologists drew 5-10 fields of view (FOVs, 4096μm2) per slide and detected stained cells as point labels. These were registered to the corresponding HE, where pathologists edited the labels to fix errors (Fig1B). After dropping slides with poor IHC staining or failed registration, slides were split into train (110) / tuning (56) / test (48) sets. We used a UNet with customized cross-entropy loss function to train a segmentation model on the point labels. To evaluate predictions, points within 3μm of each other were assumed to label the same cell. For the tumor and stroma region model, annotations of 5 tissue classes were performed directly on 260 HE slides within FOVs (1mm2) by 4 pathologists per FOV (Fig1D). Resulting data were split into train (140) / tuning (60) / test (60) sets, and a UNet model was trained. Following initial evaluation, the training set was supplemented with annotations on metastatic breast cancer (N=308), which improved performance.
Results
Predicted lymphocyte counts correlate strongly with IHC-derived GT counts (CCC 0.91, Fig1C). Cell detection F1 score, measuring cell detection regardless of class, is > 0.8 across all tissue sites in the test set, while lymphocyte classification F1 ranges from 0.66 in some metastatic sites to 0.80 in lung (Table 1). Across tissue sites, tumor region F1 ranges from 0.79 to 0.90 and stroma region F1 ranges from 0.75 to 0.81 (Table 1).


Conclusions
Together, these models accurately and reproducibly measure TIL density (Fig1A) by counting lymphocytes within tumor & stromal regions. Future validation of TIL density against clinical outcomes can lead to more personalized and effective treatment plans for NSCLC patients, by guiding decisions on ICI therapy.
VIEW THE PUBLICATION
VIEW THE POSTER