Statistical data analysis of industrial systems

  • Caoimhe Carbery

Student thesis: Doctoral ThesisDoctor of Philosophy


Sensor technologies within large manufacturing systems provide the opportunity for Big Data analytics to be performed on an abundance of production data. The major challenge is to be able to identify suitable techniques for preprocessing data to make it usable for further analysis to be able to detect dependencies or trends in the data.

Firstly, a new data analytics framework is presented that ensures a more effective engagement between statistician and manufacturing staff. It successfully identifies key preprocessing challenges and the necessity to transform the data for machine learning. A series of interactions are developed to enhance the understanding of data between the two groups. Visualisation tools are embedded within the framework that highlight relevant statistical information and allows a more efficient process for performing data analytics within industry.

For validation, data provided by Seagate on their disk drive production and three further datasets are used to validate and demonstrate the impact of the approach.

A new strategy for analysing incomplete data is introduced following the identification of missingness. Incomplete, or missing, data is a major issue related to the quality of any future analysis. This strategy, integrated within the framework, allows the user to diagnose the quality of data, identify patterns and determine the profile of missingness through clustering to provide new insights and information on the potential cause of missingness. An imputation case study considering seven missing data handling methods is presented for each dataset that identifies the most statistically sound approach.

The final contribution is a Bayesian network model which provides clearer insights into the connectivity of sensors and direct dependencies in manufacturing systems. To improve thé predictive power of the model and reduce variability, and for the first time, an ensemble of the model is introduced which demonstrates an improved prediction accuracy above 95% for each dataset.
Date of AwardDec 2019
Original languageEnglish
Awarding Institution
  • Queen's University Belfast
SupervisorRoger Woods (Supervisor) & Adele Marshall (Supervisor)

Cite this