Depression detection in read and spontaneous speech: a multimodal approach for lesser-resourced languages

Klara Daly, Oluwafemi Olukoya*

*Corresponding author for this work

Research output: Contribution to journalArticlepeer-review

13 Downloads (Pure)

Abstract

The global prevalence of depression highlights the need for innovative early detection methods. Technological advances have driven novel speech analysis strategies for depression identification, yet existing methods often overlook the nuances between read and spontaneous speech and the variability in real-world data. This research introduces a multimodal approach, employing a hybrid model that combines Long Short-Term Memory (LSTM) and Convolutional Neural Networks (CNN) for detailed audio analysis and a pre-trained BERT model for textual insights. Using a comprehensive corpus of 228 recordings, which includes 64 cases of depression clinically diagnosed by psychiatrists, this study refines a speech-based detection technique reflective of real-world scenarios. Through decision-level fusion, this methodology achieves an accuracy rate of 94.30% and an F-Score of 94.51%, outperforming existing benchmarks in the detection of depression in both read and spontaneous speech. An in-depth feature correlation analysis reveals that spontaneous speech in depressed individuals exhibits pronounced spectral patterns, particularly in Mel-frequency cepstral coefficients (MFCC) interrelationships, whereas read speech retains a subtle but significant diagnostic value. These findings not only show the effectiveness of the multimodal approach but also highlight its better diagnostic precision in the identification of depression, thus setting a new standard for depression diagnosis. By emphasising the critical role of multimodal data analysis, this research significantly advances mental health diagnostics, offering valuable insights for medical practitioners, scholars, and technologists, and represents a considerable leap forward in achieving more accurate and accessible mental health diagnostics. Our code is publicly available at https://github.com/56kd/MulitmodalDepressionDetection.

Original languageEnglish
Article number107959
Number of pages18
JournalBiomedical Signal Processing and Control
Volume108
Early online date27 Apr 2025
DOIs
Publication statusEarly online date - 27 Apr 2025

Keywords

  • depression detection
  • spontaneous speech
  • lesser-resourced languages

Fingerprint

Dive into the research topics of 'Depression detection in read and spontaneous speech: a multimodal approach for lesser-resourced languages'. Together they form a unique fingerprint.

Cite this