Model selection for prognostic time-to-event gene signature discovery with applications in early breast cancer data

Miika Ahdesmaeki*, Lee Lancashire, Vitali Proutski, Claire Wilson, Timothy S. Davison, D. Paul Harkin, Richard D. Kennedy

*Corresponding author for this work

Research output: Contribution to journalArticlepeer-review

2 Citations (Scopus)
302 Downloads (Pure)

Abstract

Model selection between competing models is a key consideration in the discovery of prognostic multigene signatures. The use of appropriate statistical performance measures as well as verification of biological significance of the signatures is imperative to maximise the chance of external validation of the generated signatures. Current approaches in time-to-event studies often use only a single measure of performance in model selection, such as logrank test p-values, or dichotomise the follow-up times at some phase of the study to facilitate signature discovery. In this study we improve the prognostic signature discovery process through the application of the multivariate partial Cox model combined with the concordance index, hazard ratio of predictions, independence from available clinical covariates and biological enrichment as measures of signature performance. The proposed framework was applied to discover prognostic multigene signatures from early breast cancer data. The partial Cox model combined with the multiple performance measures were used in both guiding the selection of the optimal panel of prognostic genes and prediction of risk within cross validation without dichotomising the follow-up times at any stage. The signatures were successfully externally cross validated in independent breast cancer datasets, yielding a hazard ratio of 2.55 [1.44, 4.51] for the top ranking signature.

Original languageEnglish
Pages (from-to)619-635
Number of pages17
JournalStatistical applications in genetics and molecular biology
Volume12
Issue number5
Early online date27 Sep 2013
DOIs
Publication statusPublished - Oct 2013

Keywords

  • gene signature
  • feature selection
  • model selection
  • prognostic biomarker
  • time to event analysis
  • COX REGRESSION-ANALYSIS
  • EXPRESSION DATA
  • SAMPLE SIZE
  • VALIDATION

ASJC Scopus subject areas

  • Genetics
  • Molecular Biology
  • Statistics and Probability
  • Computational Mathematics

Cite this