Harnessing the complexity of gene expression data from cancer: from single gene to structural pathway methods

Frank Emmert-Streib*, Shailesh Tripathi, Ricardo de Matos Simoes

*Corresponding author for this work

Research output: Contribution to journalLiterature review

17 Citations (Scopus)
157 Downloads (Pure)

Abstract

High-dimensional gene expression data provide a rich source of information because they capture the expression level of genes in dynamic states that reflect the biological functioning of a cell. For this reason, such data are suitable to reveal systems related properties inside a cell, e.g., in order to elucidate molecular mechanisms of complex diseases like breast or prostate cancer. However, this is not only strongly dependent on the sample size and the correlation structure of a data set, but also on the statistical hypotheses tested. Many different approaches have been developed over the years to analyze gene expression data to (I) identify changes in single genes, (II) identify changes in gene sets or pathways, and (III) identify changes in the correlation structure in pathways. In this paper, we review statistical methods for all three types of approaches, including subtypes, in the context of cancer data and provide links to software implementations and tools and address also the general problem of multiple hypotheses testing. Further, we provide recommendations for the selection of such analysis methods.

Original languageEnglish
Article number44
Number of pages25
JournalBiology Direct
Volume7
DOIs
Publication statusPublished - 10 Dec 2012

Keywords

  • Gene expression data
  • Cancer data
  • Statistical analysis methods
  • Pathway methods
  • Correlation structure
  • Cancer genomics
  • SET ENRICHMENT ANALYSIS
  • TRANSCRIPTIONAL REGULATORY NETWORKS
  • CHRONIC-FATIGUE-SYNDROME
  • FALSE DISCOVERY RATES
  • B-CELL LYMPHOMA
  • MICROARRAY DATA
  • SYSTEMS BIOLOGY
  • COVARIANCE-MATRIX
  • DIFFERENTIAL COEXPRESSION
  • GRAPHICAL LASSO

Cite this

Emmert-Streib, Frank ; Tripathi, Shailesh ; Simoes, Ricardo de Matos. / Harnessing the complexity of gene expression data from cancer: from single gene to structural pathway methods. In: Biology Direct. 2012 ; Vol. 7.
@article{d41c3f2f639e4fd29af198f1f09a1b36,
title = "Harnessing the complexity of gene expression data from cancer: from single gene to structural pathway methods",
abstract = "High-dimensional gene expression data provide a rich source of information because they capture the expression level of genes in dynamic states that reflect the biological functioning of a cell. For this reason, such data are suitable to reveal systems related properties inside a cell, e.g., in order to elucidate molecular mechanisms of complex diseases like breast or prostate cancer. However, this is not only strongly dependent on the sample size and the correlation structure of a data set, but also on the statistical hypotheses tested. Many different approaches have been developed over the years to analyze gene expression data to (I) identify changes in single genes, (II) identify changes in gene sets or pathways, and (III) identify changes in the correlation structure in pathways. In this paper, we review statistical methods for all three types of approaches, including subtypes, in the context of cancer data and provide links to software implementations and tools and address also the general problem of multiple hypotheses testing. Further, we provide recommendations for the selection of such analysis methods.",
keywords = "Gene expression data, Cancer data, Statistical analysis methods, Pathway methods, Correlation structure, Cancer genomics, SET ENRICHMENT ANALYSIS, TRANSCRIPTIONAL REGULATORY NETWORKS, CHRONIC-FATIGUE-SYNDROME, FALSE DISCOVERY RATES, B-CELL LYMPHOMA, MICROARRAY DATA, SYSTEMS BIOLOGY, COVARIANCE-MATRIX, DIFFERENTIAL COEXPRESSION, GRAPHICAL LASSO",
author = "Frank Emmert-Streib and Shailesh Tripathi and Simoes, {Ricardo de Matos}",
year = "2012",
month = "12",
day = "10",
doi = "10.1186/1745-6150-7-44",
language = "English",
volume = "7",
journal = "Biology Direct",
issn = "1745-6150",
publisher = "BioMed Central",

}

Harnessing the complexity of gene expression data from cancer: from single gene to structural pathway methods. / Emmert-Streib, Frank; Tripathi, Shailesh; Simoes, Ricardo de Matos.

In: Biology Direct, Vol. 7, 44, 10.12.2012.

Research output: Contribution to journalLiterature review

TY - JOUR

T1 - Harnessing the complexity of gene expression data from cancer: from single gene to structural pathway methods

AU - Emmert-Streib, Frank

AU - Tripathi, Shailesh

AU - Simoes, Ricardo de Matos

PY - 2012/12/10

Y1 - 2012/12/10

N2 - High-dimensional gene expression data provide a rich source of information because they capture the expression level of genes in dynamic states that reflect the biological functioning of a cell. For this reason, such data are suitable to reveal systems related properties inside a cell, e.g., in order to elucidate molecular mechanisms of complex diseases like breast or prostate cancer. However, this is not only strongly dependent on the sample size and the correlation structure of a data set, but also on the statistical hypotheses tested. Many different approaches have been developed over the years to analyze gene expression data to (I) identify changes in single genes, (II) identify changes in gene sets or pathways, and (III) identify changes in the correlation structure in pathways. In this paper, we review statistical methods for all three types of approaches, including subtypes, in the context of cancer data and provide links to software implementations and tools and address also the general problem of multiple hypotheses testing. Further, we provide recommendations for the selection of such analysis methods.

AB - High-dimensional gene expression data provide a rich source of information because they capture the expression level of genes in dynamic states that reflect the biological functioning of a cell. For this reason, such data are suitable to reveal systems related properties inside a cell, e.g., in order to elucidate molecular mechanisms of complex diseases like breast or prostate cancer. However, this is not only strongly dependent on the sample size and the correlation structure of a data set, but also on the statistical hypotheses tested. Many different approaches have been developed over the years to analyze gene expression data to (I) identify changes in single genes, (II) identify changes in gene sets or pathways, and (III) identify changes in the correlation structure in pathways. In this paper, we review statistical methods for all three types of approaches, including subtypes, in the context of cancer data and provide links to software implementations and tools and address also the general problem of multiple hypotheses testing. Further, we provide recommendations for the selection of such analysis methods.

KW - Gene expression data

KW - Cancer data

KW - Statistical analysis methods

KW - Pathway methods

KW - Correlation structure

KW - Cancer genomics

KW - SET ENRICHMENT ANALYSIS

KW - TRANSCRIPTIONAL REGULATORY NETWORKS

KW - CHRONIC-FATIGUE-SYNDROME

KW - FALSE DISCOVERY RATES

KW - B-CELL LYMPHOMA

KW - MICROARRAY DATA

KW - SYSTEMS BIOLOGY

KW - COVARIANCE-MATRIX

KW - DIFFERENTIAL COEXPRESSION

KW - GRAPHICAL LASSO

U2 - 10.1186/1745-6150-7-44

DO - 10.1186/1745-6150-7-44

M3 - Literature review

VL - 7

JO - Biology Direct

JF - Biology Direct

SN - 1745-6150

M1 - 44

ER -