An Evaluation of Speech-Based Recognition of Emotional and Physiological Markers of Stress

Baird, Alice and Triantafyllopoulos, Andreas and Zänkert, Sandra and Ottl, Sandra and Christ, Lukas and Stappen, Lukas and Konzok, Julian and Sturmbauer, Sarah and Meßner, Eva-Maria and Kudielka, Brigitte M. and Rohleder, Nicolas and Baumeister, Harald and Schuller, Björn W. (2021) An Evaluation of Speech-Based Recognition of Emotional and Physiological Markers of Stress. Frontiers in Computer Science, 3. ISSN 2624-9898

[thumbnail of pubmed-zip/versions/1/package-entries/fcomp-03-750284/fcomp-03-750284.pdf] Text
pubmed-zip/versions/1/package-entries/fcomp-03-750284/fcomp-03-750284.pdf - Published Version

Download (2MB)

Abstract

Life in modern societies is fast-paced and full of stress-inducing demands. The development of stress monitoring methods is a growing area of research due to the personal and economic advantages that timely detection provides. Studies have shown that speech-based features can be utilised to robustly predict several physiological markers of stress, including emotional state, continuous heart rate, and the stress hormone, cortisol. In this contribution, we extend previous works by the authors, utilising three German language corpora including more than 100 subjects undergoing a Trier Social Stress Test protocol. We present cross-corpus and transfer learning results which explore the efficacy of the speech signal to predict three physiological markers of stress—sequentially measured saliva-based cortisol, continuous heart rate as beats per minute (BPM), and continuous respiration. For this, we extract several features from audio as well as video and apply various machine learning architectures, including a temporal context-based Long Short-Term Memory Recurrent Neural Network (LSTM-RNN). For the task of predicting cortisol levels from speech, deep learning improves on results obtained by conventional support vector regression—yielding a Spearman correlation coefficient (ρ) of 0.770 and 0.698 for cortisol measurements taken 10 and 20 min after the stress period for the two corpora applicable—showing that audio features alone are sufficient for predicting cortisol, with audiovisual fusion to an extent improving such results. We also obtain a Root Mean Square Error (RMSE) of 38 and 22 BPM for continuous heart rate prediction on the two corpora where this information is available, and a normalised RMSE (NRMSE) of 0.120 for respiration prediction (−10: 10). Both of these continuous physiological signals show to be highly effective markers of stress (based on cortisol grouping analysis), both when available as ground truth and when predicted using speech. This contribution opens up new avenues for future exploration of these signals as proxies for stress in naturalistic settings.

Item Type: Article
Subjects: Institute Archives > Computer Science
Depositing User: Managing Editor
Date Deposited: 24 Feb 2023 03:38
Last Modified: 02 Jul 2024 12:27
URI: http://eprint.subtopublish.com/id/eprint/985

Actions (login required)

View Item
View Item