Comparison of machine learning methods for the classification of cardiovascular disease

Rachael Hagan*, Charles J. Gillan, Fiona Mallet

*Corresponding author for this work

Research output: Contribution to journalArticlepeer-review

69 Downloads (Pure)


Researchers are devoting significant effort to use machine learning algorithms, a subset of the wider field of artificial intelligence, to detect disease in a single patient. There exists extensive research in the application of machine learning methods in health care, and more specifically, cardiovascular disease. We have chosen to focus this initial investigation on the case of cardiac disease in order focus our efforts on as much detail of the methods as possible.

In this paper we explore the uncertainty that exists across applying machine learning methods, namely: Support Vector Machines (SVM), Multi-Layer Perception Neural Networks (MLP) and ensemble methods, for the classification of cardiovascular disease. Our work uses two public datasets with significantly different characteristics in order to assess the potential differences in the uncertainty of the methods. The cardiac arrhythmia dataset from the University California Irvine (UCI) Machine Learning repository has almost three hundred specific physiological data points per patient gathered from analysis of electrocardiogram (ECG) signals on several hundred patients although the distribution of cases is severely imbalanced. Contrast this with one dataset, reporting on cardiovascular disease from the Kaggle collection where there are nearly seventy thousand patient records. However, this Kaggle dataset reports only a small number of parameters per patient record, values such as serum cholesterol level, diastolic and systolic blood pressure, relative blood glucose levels and presence or absence of angina.

Models built for the UCI dataset have an order of magnitude more dimensions or alternatively have much larger numbers of input nodes for neural network models compared to the models developed the Kaggle dataset. On the other hand, the Kaggle dataset has an order of magnitude more records for training and validation than the UCI dataset. Our results compare and contrast the uncertainty in models built using support vector machine, multilayer perceptron neural networks and decision trees for these two datasets. The work suggests that it will be instructive to extend our analysis to datasets of other patho-physiologies.
Original languageEnglish
Article number100606
Number of pages10
JournalInformatics in Medicine Unlocked
Early online date20 May 2021
Publication statusEarly online date - 20 May 2021


Dive into the research topics of 'Comparison of machine learning methods for the classification of cardiovascular disease'. Together they form a unique fingerprint.

Cite this