Prior knowledge transfer across transcriptional data sets and technologies using compositional statistics yields new mislabelled ovarian cell line

Jaine K Blayney, Timothy Davison, Nuala McCabe, Steven Walker, Karen Keating, Thomas Delaney, Caroline Greenan, Alistair R Williams, W Glenn McCluggage, Amanda Capes-Davis, D Paul Harkin, Charlie Gourley, Richard D Kennedy

Research output: Contribution to journalArticlepeer-review

8 Citations (Scopus)
273 Downloads (Pure)

Abstract

Here, we describe gene expression compositional assignment (GECA), a powerful, yet simple method based on compositional statistics that can validate the transfer of prior knowledge, such as gene lists, into independent data sets, platforms and technologies. Transcriptional profiling has been used to derive gene lists that stratify patients into prognostic molecular subgroups and assess biomarker performance in the pre-clinical setting. Archived public data sets are an invaluable resource for subsequent in silico validation, though their use can lead to data integration issues. We show that GECA can be used without the need for normalising expression levels between data sets and can outperform rank-based correlation methods. To validate GECA, we demonstrate its success in the cross-platform transfer of gene lists in different domains including: bladder cancer staging, tumour site of origin and mislabelled cell lines. We also show its effectiveness in transferring an epithelial ovarian cancer prognostic gene signature across technologies, from a microarray to a next-generation sequencing setting. In a final case study, we predict the tumour site of origin and histopathology of epithelial ovarian cancer cell lines. In particular, we identify and validate the commonly-used cell line OVCAR-5 as non-ovarian, being gastrointestinal in origin. GECA is available as an open-source R package.

Original languageEnglish
Number of pages10
JournalNucleic Acids Research
Early online date28 Jun 2016
DOIs
Publication statusEarly online date - 28 Jun 2016

Fingerprint Dive into the research topics of 'Prior knowledge transfer across transcriptional data sets and technologies using compositional statistics yields new mislabelled ovarian cell line'. Together they form a unique fingerprint.

Cite this