Prostate cancer evolution employing artificial intelligence to enhance biomarker discovery

  • Ross Murphy

Student thesis: Doctoral ThesisDoctor of Philosophy


Prostate cancer is highly heterogeneous, with the potential for multifocality with phenotypically and genetically distinct tumour foci. Additionally, cellular heterogeneity has the ability to harbour phenotypically diverse cancer cell types. Prostate cancer needs to be explored across different levels to enhance our understanding of disease progression. This can be achieved through investigating progression at a cellular level, before exploring whole patient genomics, and then across larger patient cohorts. Technological advancements within genomics are changing how we understand prostate tumours and their heterogenous nature, which in turn have the potential to better deliver personalized medicine. This thesis questions whether bioinformatics tools can assess intra-tumour heterogeneity via alignment-free methods and identify novel prostate cancer biomarkers associated with advanced disease across these different levels.

Exploring isolated prostate cellular sub-populations allows for novel biomarkers and targetable pathways to be identified towards cancer stem-cell (CSC) biology. Towards the patient level, intra-patient heterogeneity can be assessed through next generation sequencing by adopting multi-regional sampling via whole exome sequencing. Expanding to the cohort level, AI learning algorithms in particle swarm optimization (PSO) can explore and visualize, in real time, the most informative and succinct gene signatures from transcriptomic datasets.

Differentially expressed genes (DEGs) from the isolated prostate side-populations (SPs) have shown these prostate SPs to be genetically distinct from terminally differentiated cells. DEGs were also used as input gene signatures for identifying FDA-approved drug targets for potential repurposing towards hormone manipulated distal SP (DSP) samples when compared against non- androgen deprivation therapy DSP samples. Using novel alignment-free phylogenetic analysis, we assessed intra-patient heterogeneity across different patients’ multi-regional samples and identified cases where divergent branching from phylogenetic trees validated different lesions within a patient’s prostate and the impacts this may have on drug targeting. In comparison with standard PSO algorithms, our enhanced binary PSO (EBPSO) Flask mini-framework and Python module produces similar to better accuracy for gene signature discovery using less genes and at a much faster runtime on clinical cancer transcriptomics cohorts. The key genes within these signatures reveal functions that were well correlated to their cancer type. Familial adenomatous polyposis (FAP), a key gene within the candidate signature produced from the FASTMAN prostate cancer dataset, has been associated with prostate cancer metastatic disease and is seen to be expressed in prostate cancer stroma.

We have proposed that recurring lesions originate from the DSP cells following hormone manipulation, which in turn will need different therapeutic targets from the primary tumour. Multi-regional sampling in combination with novel alignment-free bioinformatics software such as those assessing intra-patient heterogeneity is a valuable option to help deliver personalised medicine through more robust efforts in the discovery of targetable pathways and therapeutic strategies. Our EBPSO Flask mini-framework can aid medical data research for novel discoveries at an efficient pace with lower computational needs, showing real time visualizations and analysis statistics towards discovery.

Date of AwardJul 2022
Original languageEnglish
Awarding Institution
  • Queen's University Belfast
SponsorsFriends of the Cancer Centre
SupervisorSuneil Jain (Supervisor), Darragh McArt (Supervisor) & Melissa LaBonte Wilson (Supervisor)


  • Bioinformatics
  • prostate cancer
  • gene expression signature
  • big data
  • multi-omics
  • prognostic factors
  • prognostic signature
  • machine learning
  • particle swarm optimization
  • alignment-free
  • genomics
  • tumour hetereogeneity
  • biomarker

Cite this