Baird, Alice and Triantafyllopoulos, Andreas and Zänkert, Sandra and Ottl, Sandra and Christ, Lukas and Stappen, Lukas and Konzok, Julian and Sturmbauer, Sarah and Meßner, Eva-Maria and Kudielka, Brigitte M. and Rohleder, Nicolas and Baumeister, Harald and Schuller, Björn W. (2021) An Evaluation of Speech-Based Recognition of Emotional and Physiological Markers of Stress. Frontiers in Computer Science, 3. ISSN 2624-9898
pubmed-zip/versions/1/package-entries/fcomp-03-750284/fcomp-03-750284.pdf - Published Version
Download (2MB)
Abstract
Life in modern societies is fast-paced and full of stress-inducing demands. The development of stress monitoring methods is a growing area of research due to the personal and economic advantages that timely detection provides. Studies have shown that speech-based features can be utilised to robustly predict several physiological markers of stress, including emotional state, continuous heart rate, and the stress hormone, cortisol. In this contribution, we extend previous works by the authors, utilising three German language corpora including more than 100 subjects undergoing a Trier Social Stress Test protocol. We present cross-corpus and transfer learning results which explore the efficacy of the speech signal to predict three physiological markers of stress—sequentially measured saliva-based cortisol, continuous heart rate as beats per minute (BPM), and continuous respiration. For this, we extract several features from audio as well as video and apply various machine learning architectures, including a temporal context-based Long Short-Term Memory Recurrent Neural Network (LSTM-RNN). For the task of predicting cortisol levels from speech, deep learning improves on results obtained by conventional support vector regression—yielding a Spearman correlation coefficient (ρ) of 0.770 and 0.698 for cortisol measurements taken 10 and 20 min after the stress period for the two corpora applicable—showing that audio features alone are sufficient for predicting cortisol, with audiovisual fusion to an extent improving such results. We also obtain a Root Mean Square Error (RMSE) of 38 and 22 BPM for continuous heart rate prediction on the two corpora where this information is available, and a normalised RMSE (NRMSE) of 0.120 for respiration prediction (−10: 10). Both of these continuous physiological signals show to be highly effective markers of stress (based on cortisol grouping analysis), both when available as ground truth and when predicted using speech. This contribution opens up new avenues for future exploration of these signals as proxies for stress in naturalistic settings.
Item Type: | Article |
---|---|
Subjects: | Institute Archives > Computer Science |
Depositing User: | Managing Editor |
Date Deposited: | 24 Feb 2023 03:38 |
Last Modified: | 02 Jul 2024 12:27 |
URI: | http://eprint.subtopublish.com/id/eprint/985 |