Abstract
Topological Data Analysis (TDA) is a rapidly growing field that uses concepts from topology to uncover the structure of high-dimensional data. By examining the "shape" of data, TDA provides unique insights into complex datasets that traditional methods may overlook. Among its techniques, the Mapper algorithm provides a graph-based visualisation, where nodes represent clusters of similar samples, and edges denote relationships between these clusters. Mapper is particularly useful for exploratory data analysis, with structural features such as node flares often indicating significant patterns or outliers. By colouring nodes based on specific functions, researchers can highlight distinct regions and subsets within the data.Mapper has proven useful across a wide range of domains. Notably, in biomedicine, it was used to discover a subgroup of breast cancer patients with a 100\% 5-year survival rate. However, there are a number of challenges presented to the users in any domain to effectively apply the method, firstly the sensitivity of the algorithm to its parameter settings where the output can vary significantly depending on the user-selected parameters, and secondly, interpreting Mapper graphs and explaining what their structural features tell us about (the underlying data). These challenges reduce Mapper’s accessibility, particularly for non-expert users.
This thesis addresses the challenges of parameter selection in the Mapper algorithm and the challenge of explaining the output to enhance its overall accessibility for users. First, an Ensemble Learning (EL) approach is proposed to address the difficulty and infeasibility of selecting one clustering algorithm hyperparameter to suit the data structure variation across all subsets of data. This approach also investigates the quality of the resulting graphs from the different cluster ensembles to suggest the most appropriate clustering algorithm. Secondly, the thesis addresses the choice of lens function and covering parameters using structural signals from the data to choose stable graphs representative of the underlying data and combine them to give a more robust ensembled graph. Thirdly, the thesis proposes a novel framework to integrate eXplainable Artificial Intelligence into Mapper to help practitioners interpret the structural qualities of the Mapper graph. Each of the proposed techniques is evaluated against several baselines using datasets of varied characteristics. These contributions advance the utility of the Mapper algorithm, making it a more practical tool for exploring and analysing high-dimensional data.
Thesis embargoed until 31st July 2026.
| Date of Award | Jul 2025 |
|---|---|
| Original language | English |
| Awarding Institution |
|
| Sponsors | Northern Ireland Department for the Economy & Dioscuri Program |
| Supervisor | Jesus Martinez-del-Rincon (Supervisor), Anna Jurek-Loughrey (Supervisor) & Nick Orr (Supervisor) |
Keywords
- topological data analysis
- ensemble learning
- explainability
- visualisation
- mapper algorithm
Cite this
- Standard