Measuring diagnostic heterogeneity using text-mining of the lived experiences of patients

Chandril Ghosh*, Duncan McVicar, Gavin Davidson, Ciaran Shannon

*Corresponding author for this work

Research output: Contribution to journalArticlepeer-review

2 Citations (Scopus)
82 Downloads (Pure)


Background: The diagnostic system is fundamental to any health discipline, including mental health, as it defines mental illness and helps inform possible treatment and prognosis. Thus, the procedure to estimate the reliability of such a system is of utmost importance. The current ways of measuring the reliability of the diagnostic system have limitations. In this study, we propose an alternative approach for verifying and measuring the reliability of the existing system. Methods: We perform Jaccard’s similarity index analysis between first person accounts of patients with the same disorder (in this case Major Depressive Disorder) and between those who received a diagnosis of a different disorder (in this case Bulimia Nervosa) to demonstrate that narratives, when suitably processed, are a rich source of data for this purpose. We then analyse 228 narratives of lived experiences from patients with mental disorders, using Python code script, to demonstrate that patients with the same diagnosis have very different illness experiences. Results: The results demonstrate that narratives are a statistically viable data resource which can distinguish between patients who receive different diagnostic labels. However, the similarity coefficients between 99.98% of narrative pairs, including for those with similar diagnoses, are low (< 0.3), indicating diagnostic Heterogeneity. Conclusions: The current study proposes an alternative approach to measuring diagnostic Heterogeneity of the categorical taxonomic systems (e.g. the Diagnostic and Statistical Manual, DSM). In doing so, we demonstrate the high Heterogeneity and limited reliability of the existing system using patients’ written narratives of their illness experiences as the only data source. Potential applications of these outputs are discussed in the context of healthcare management and mental health research.
Original languageEnglish
Pages (from-to)1-12
Number of pages12
JournalBMC Psychiatry
Issue number60
Publication statusPublished - 28 Jan 2021


  • Diagnosis, Taxonomy, Heterogeneity, Lived experiences, Reliability


Dive into the research topics of 'Measuring diagnostic heterogeneity using text-mining of the lived experiences of patients'. Together they form a unique fingerprint.

Cite this