Abstract
The global prevalence of depression highlights the need for innovative early detection methods. Technological advances have driven novel speech analysis strategies for depression identification, yet existing methods often overlook the nuances between read and spontaneous speech and the variability in real-world data. This research introduces a multimodal approach, employing a hybrid model that combines Long Short-Term Memory (LSTM) and Convolutional Neural Networks (CNN) for detailed audio analysis and a pre-trained BERT model for textual insights. Using a comprehensive corpus of 228 recordings, which includes 64 cases of depression clinically diagnosed by psychiatrists, this study refines a speech-based detection technique reflective of real-world scenarios. Through decision-level fusion, this methodology achieves an accuracy rate of 94.30% and an F-Score of 94.51%, outperforming existing benchmarks in the detection of depression in both read and spontaneous speech. An in-depth feature correlation analysis reveals that spontaneous speech in depressed individuals exhibits pronounced spectral patterns, particularly in Mel-frequency cepstral coefficients (MFCC) interrelationships, whereas read speech retains a subtle but significant diagnostic value. These findings not only show the effectiveness of the multimodal approach but also highlight its better diagnostic precision in the identification of depression, thus setting a new standard for depression diagnosis. By emphasising the critical role of multimodal data analysis, this research significantly advances mental health diagnostics, offering valuable insights for medical practitioners, scholars, and technologists, and represents a considerable leap forward in achieving more accurate and accessible mental health diagnostics. Our code is publicly available at https://github.com/56kd/MulitmodalDepressionDetection.
Original language | English |
---|---|
Article number | 107959 |
Number of pages | 18 |
Journal | Biomedical Signal Processing and Control |
Volume | 108 |
Early online date | 27 Apr 2025 |
DOIs | |
Publication status | Early online date - 27 Apr 2025 |
Keywords
- depression detection
- spontaneous speech
- lesser-resourced languages