RMA with quantile normalization mixes biological signals between different sample groups in microarray data analysis

Chang Sik Kim, Seungwoo Hwang, Shu-Dong Zhang

Research output: Chapter in Book/Report/Conference proceedingConference contribution

3 Citations (Scopus)

Abstract

Quantile normalization (QN) is a technique for microarray data processing and is the default normalization method in the Robust Multi-array Average (RMA) procedure, which was primarily designed for analysing gene expression data from Affymetrix arrays. Given the abundance of Affymetrix microarrays and the popularity of the RMA method, it is crucially important that the normalization procedure is applied appropriately. In this study we carried out simulation experiments and also analysed real microarray data to investigate the suitability of RMA when it is applied to dataset with different groups of biological samples. From our experiments, we showed that RMA with QN does not preserve the biological signal included in each group, but rather it would mix the signals between the groups. We also showed that the Median Polish method in the summarization step of RMA has similar mixing effect. RMA is one of the most widely used methods in microarray data processing and has been applied to a vast volume of data in biomedical research. The problematic behaviour of this method suggests that previous studies employing RMA could have been misadvised or adversely affected. Therefore we think it is crucially important that the research community recognizes the issue and starts to address it. The two core elements of the RMA method, quantile normalization and Median Polish, both have the undesirable effects of mixing biological signals between different sample groups, which can be detrimental to drawing valid biological conclusions and to any subsequent analyses. Based on the evidence presented here and that in the literature, we recommend exercising caution when using RMA as a method of processing microarray gene expression data, particularly in situations where there are likely to be unknown subgroups of samples.
Original languageEnglish
Title of host publication2014 IEEE International Conference on Bioinformatics and Biomedicine (BIBM)
PublisherInstitute of Electrical and Electronics Engineers (IEEE)
Pages139-143
Number of pages5
DOIs
Publication statusPublished - Nov 2014
EventThe IEEE International Conference on Bioinformatics and Biomedicine (BIBM) 2014 - Belfast, United Kingdom
Duration: 02 Nov 201405 Nov 2014

Conference

ConferenceThe IEEE International Conference on Bioinformatics and Biomedicine (BIBM) 2014
CountryUnited Kingdom
CityBelfast
Period02/11/201405/11/2014

Fingerprint

gene expression
normalisation
data analysis
method
experiment
simulation
effect
preserve

Cite this

Kim, C. S., Hwang, S., & Zhang, S-D. (2014). RMA with quantile normalization mixes biological signals between different sample groups in microarray data analysis. In 2014 IEEE International Conference on Bioinformatics and Biomedicine (BIBM) (pp. 139-143). Institute of Electrical and Electronics Engineers (IEEE). https://doi.org/10.1109/BIBM.2014.6999142
Kim, Chang Sik ; Hwang, Seungwoo ; Zhang, Shu-Dong. / RMA with quantile normalization mixes biological signals between different sample groups in microarray data analysis. 2014 IEEE International Conference on Bioinformatics and Biomedicine (BIBM). Institute of Electrical and Electronics Engineers (IEEE), 2014. pp. 139-143
@inproceedings{043dbd079d6f4396bef6a549d87276d9,
title = "RMA with quantile normalization mixes biological signals between different sample groups in microarray data analysis",
abstract = "Quantile normalization (QN) is a technique for microarray data processing and is the default normalization method in the Robust Multi-array Average (RMA) procedure, which was primarily designed for analysing gene expression data from Affymetrix arrays. Given the abundance of Affymetrix microarrays and the popularity of the RMA method, it is crucially important that the normalization procedure is applied appropriately. In this study we carried out simulation experiments and also analysed real microarray data to investigate the suitability of RMA when it is applied to dataset with different groups of biological samples. From our experiments, we showed that RMA with QN does not preserve the biological signal included in each group, but rather it would mix the signals between the groups. We also showed that the Median Polish method in the summarization step of RMA has similar mixing effect. RMA is one of the most widely used methods in microarray data processing and has been applied to a vast volume of data in biomedical research. The problematic behaviour of this method suggests that previous studies employing RMA could have been misadvised or adversely affected. Therefore we think it is crucially important that the research community recognizes the issue and starts to address it. The two core elements of the RMA method, quantile normalization and Median Polish, both have the undesirable effects of mixing biological signals between different sample groups, which can be detrimental to drawing valid biological conclusions and to any subsequent analyses. Based on the evidence presented here and that in the literature, we recommend exercising caution when using RMA as a method of processing microarray gene expression data, particularly in situations where there are likely to be unknown subgroups of samples.",
author = "Kim, {Chang Sik} and Seungwoo Hwang and Shu-Dong Zhang",
year = "2014",
month = "11",
doi = "10.1109/BIBM.2014.6999142",
language = "English",
pages = "139--143",
booktitle = "2014 IEEE International Conference on Bioinformatics and Biomedicine (BIBM)",
publisher = "Institute of Electrical and Electronics Engineers (IEEE)",

}

Kim, CS, Hwang, S & Zhang, S-D 2014, RMA with quantile normalization mixes biological signals between different sample groups in microarray data analysis. in 2014 IEEE International Conference on Bioinformatics and Biomedicine (BIBM). Institute of Electrical and Electronics Engineers (IEEE), pp. 139-143, The IEEE International Conference on Bioinformatics and Biomedicine (BIBM) 2014, Belfast, United Kingdom, 02/11/2014. https://doi.org/10.1109/BIBM.2014.6999142

RMA with quantile normalization mixes biological signals between different sample groups in microarray data analysis. / Kim, Chang Sik; Hwang, Seungwoo; Zhang, Shu-Dong.

2014 IEEE International Conference on Bioinformatics and Biomedicine (BIBM). Institute of Electrical and Electronics Engineers (IEEE), 2014. p. 139-143.

Research output: Chapter in Book/Report/Conference proceedingConference contribution

TY - GEN

T1 - RMA with quantile normalization mixes biological signals between different sample groups in microarray data analysis

AU - Kim, Chang Sik

AU - Hwang, Seungwoo

AU - Zhang, Shu-Dong

PY - 2014/11

Y1 - 2014/11

N2 - Quantile normalization (QN) is a technique for microarray data processing and is the default normalization method in the Robust Multi-array Average (RMA) procedure, which was primarily designed for analysing gene expression data from Affymetrix arrays. Given the abundance of Affymetrix microarrays and the popularity of the RMA method, it is crucially important that the normalization procedure is applied appropriately. In this study we carried out simulation experiments and also analysed real microarray data to investigate the suitability of RMA when it is applied to dataset with different groups of biological samples. From our experiments, we showed that RMA with QN does not preserve the biological signal included in each group, but rather it would mix the signals between the groups. We also showed that the Median Polish method in the summarization step of RMA has similar mixing effect. RMA is one of the most widely used methods in microarray data processing and has been applied to a vast volume of data in biomedical research. The problematic behaviour of this method suggests that previous studies employing RMA could have been misadvised or adversely affected. Therefore we think it is crucially important that the research community recognizes the issue and starts to address it. The two core elements of the RMA method, quantile normalization and Median Polish, both have the undesirable effects of mixing biological signals between different sample groups, which can be detrimental to drawing valid biological conclusions and to any subsequent analyses. Based on the evidence presented here and that in the literature, we recommend exercising caution when using RMA as a method of processing microarray gene expression data, particularly in situations where there are likely to be unknown subgroups of samples.

AB - Quantile normalization (QN) is a technique for microarray data processing and is the default normalization method in the Robust Multi-array Average (RMA) procedure, which was primarily designed for analysing gene expression data from Affymetrix arrays. Given the abundance of Affymetrix microarrays and the popularity of the RMA method, it is crucially important that the normalization procedure is applied appropriately. In this study we carried out simulation experiments and also analysed real microarray data to investigate the suitability of RMA when it is applied to dataset with different groups of biological samples. From our experiments, we showed that RMA with QN does not preserve the biological signal included in each group, but rather it would mix the signals between the groups. We also showed that the Median Polish method in the summarization step of RMA has similar mixing effect. RMA is one of the most widely used methods in microarray data processing and has been applied to a vast volume of data in biomedical research. The problematic behaviour of this method suggests that previous studies employing RMA could have been misadvised or adversely affected. Therefore we think it is crucially important that the research community recognizes the issue and starts to address it. The two core elements of the RMA method, quantile normalization and Median Polish, both have the undesirable effects of mixing biological signals between different sample groups, which can be detrimental to drawing valid biological conclusions and to any subsequent analyses. Based on the evidence presented here and that in the literature, we recommend exercising caution when using RMA as a method of processing microarray gene expression data, particularly in situations where there are likely to be unknown subgroups of samples.

U2 - 10.1109/BIBM.2014.6999142

DO - 10.1109/BIBM.2014.6999142

M3 - Conference contribution

SP - 139

EP - 143

BT - 2014 IEEE International Conference on Bioinformatics and Biomedicine (BIBM)

PB - Institute of Electrical and Electronics Engineers (IEEE)

ER -

Kim CS, Hwang S, Zhang S-D. RMA with quantile normalization mixes biological signals between different sample groups in microarray data analysis. In 2014 IEEE International Conference on Bioinformatics and Biomedicine (BIBM). Institute of Electrical and Electronics Engineers (IEEE). 2014. p. 139-143 https://doi.org/10.1109/BIBM.2014.6999142