TY - JOUR
T1 - Tumour purity assessment with deep learning in colorectal cancer and impact on molecular analysis
AU - Schoenpflug, Lydia A
AU - Chatzipli, Aikaterini
AU - Sirinukunwattana, Korsuk
AU - Richman, Susan
AU - Blake, Andrew
AU - Robineau, James
AU - Mertz, Kirsten D
AU - Verrill, Clare
AU - Leedham, Simon J
AU - Hardy, Claire
AU - Whalley, Celina
AU - Redmond, Keara
AU - Dunne, Philip
AU - Walker, Steven
AU - Beggs, Andrew D
AU - McDermott, Ultan
AU - Murray, Graeme
AU - Samuel, Leslie M
AU - Seymour, Matthew
AU - Tomlinson, Ian
AU - Quirke, Philip
AU - Rittscher, Jens
AU - Maughan, Tim
AU - Domingo, Enric
AU - Koelzer, Viktor
AU - Adams, Richard
AU - Youdell, Michael
AU - Koelzer, Viktor
AU - Bach, Simon
AU - Beggs, Andrew
AU - Whalley, Celina
AU - Brown, Louise
AU - Buffa, Francesca
AU - Campbell, Peter
AU - Cazier, Jean‐Baptiste
AU - Domingo, Enric
AU - Blake, Andrew
AU - Wu, Chieh‐His
AU - Chatzipli, Aikaterini
AU - Hardy, Claire
AU - Richman, Susan
AU - Higgins, Geoff
AU - Kennedy, Richard
AU - Lawler, Mark
AU - Wilson, Richard
AU - S-CORT Consortium
PY - 2025/2
Y1 - 2025/2
N2 - Tumour content plays a pivotal role in directing the bioinformatic analysis of molecular profiles such as copy number variation (CNV). In clinical application, tumour purity estimation (TPE) is achieved either through visual pathological review [conventional pathology (CP)] or the deconvolution of molecular data. While CP provides a direct measurement, it demonstrates modest reproducibility and lacks standardisation. Conversely, deconvolution methods offer an indirect assessment with uncertain accuracy, underscoring the necessity for innovative approaches. SoftCTM is an open‐source, multiorgan deep‐learning (DL) model for the detection of tumour and non‐tumour cells in H&E‐stained slides, developed within the Overlapped Cell on Tissue Dataset for Histopathology (OCELOT) Challenge 2023. Here, using three large multicentre colorectal cancer (CRC) cohorts (N = 1,097 patients) with digital pathology and multi‐omic data, we compare the utility and accuracy of TPE with SoftCTM versus CP and bioinformatic deconvolution methods (RNA expression, DNA methylation) for downstream molecular analysis, including CNV profiling. SoftCTM showed technical repeatability when applied twice on the same slide (r = 1.0) and excellent correlations in paired H&E slides (r > 0.9). TPEs profiled by SoftCTM correlated highly with RNA expression (r = 0.59) and DNA methylation (r = 0.40), while TPEs by CP showed a lower correlation with RNA expression (r = 0.41) and DNA methylation (r = 0.29). We show that CP and deconvolution methods respectively underestimate and overestimate tumour content compared to SoftCTM, resulting in 6–13% differing CNV calls. In summary, TPE with SoftCTM enables reproducibility, automation, and standardisation at single‐cell resolution. SoftCTM estimates (M = 58.9%, SD ±16.3%) reconcile the overestimation by molecular data extrapolation (RNA expression: M = 79.2%, SD ±10.5, DNA methylation: M = 62.7%, SD ±11.8%) and underestimation by CP (M = 35.9%, SD ±13.1%), providing a more reliable middle ground. A fully integrated computational pathology solution could therefore be used to improve downstream molecular analyses for research and clinics.
AB - Tumour content plays a pivotal role in directing the bioinformatic analysis of molecular profiles such as copy number variation (CNV). In clinical application, tumour purity estimation (TPE) is achieved either through visual pathological review [conventional pathology (CP)] or the deconvolution of molecular data. While CP provides a direct measurement, it demonstrates modest reproducibility and lacks standardisation. Conversely, deconvolution methods offer an indirect assessment with uncertain accuracy, underscoring the necessity for innovative approaches. SoftCTM is an open‐source, multiorgan deep‐learning (DL) model for the detection of tumour and non‐tumour cells in H&E‐stained slides, developed within the Overlapped Cell on Tissue Dataset for Histopathology (OCELOT) Challenge 2023. Here, using three large multicentre colorectal cancer (CRC) cohorts (N = 1,097 patients) with digital pathology and multi‐omic data, we compare the utility and accuracy of TPE with SoftCTM versus CP and bioinformatic deconvolution methods (RNA expression, DNA methylation) for downstream molecular analysis, including CNV profiling. SoftCTM showed technical repeatability when applied twice on the same slide (r = 1.0) and excellent correlations in paired H&E slides (r > 0.9). TPEs profiled by SoftCTM correlated highly with RNA expression (r = 0.59) and DNA methylation (r = 0.40), while TPEs by CP showed a lower correlation with RNA expression (r = 0.41) and DNA methylation (r = 0.29). We show that CP and deconvolution methods respectively underestimate and overestimate tumour content compared to SoftCTM, resulting in 6–13% differing CNV calls. In summary, TPE with SoftCTM enables reproducibility, automation, and standardisation at single‐cell resolution. SoftCTM estimates (M = 58.9%, SD ±16.3%) reconcile the overestimation by molecular data extrapolation (RNA expression: M = 79.2%, SD ±10.5, DNA methylation: M = 62.7%, SD ±11.8%) and underestimation by CP (M = 35.9%, SD ±13.1%), providing a more reliable middle ground. A fully integrated computational pathology solution could therefore be used to improve downstream molecular analyses for research and clinics.
KW - artificial intelligence
KW - personalised medicine
KW - pathology
KW - diagnostic molecular pathology
KW - colorectal cancer
U2 - 10.1002/path.6376
DO - 10.1002/path.6376
M3 - Article
SN - 0022-3417
VL - 265
SP - 184
EP - 197
JO - Journal of Pathology
JF - Journal of Pathology
IS - 2
ER -