top of page

Audio recordings/vocal indices of stress

Development of a Standard Protocol for the Assessment of Stress-Related Vocal Responding

PIs: Brian Baucom & Paula G. Williams, University of Utah

Co-I: Panayiotis Georgiou, University of Southern California


The purpose of this study is to examine the association between acoustic characteristics of speech and stress measures in the service of developing a standard voice analysis protocol for stress assessment. Physical properties of the human voice, particularly fundamental frequency (f0), have been linked to variety of physiological, behavioral, and affective outcomes that suggest it may be a promising new method of stress measurement. F0 refers to the lowest frequency harmonic of the speech sound wave and is perceived as the pitch of someone’s voice. Prior research has found associations with affective expression (Baucom et al., 2015a; Lee et al., 2014), divorce (Kliem et al., 2015), response to therapy (e.g., Baucom et al., 2009, 2012), and negative communication patterns (e.g., Baucom et al., 2015b). F0 also follows a circadian pattern and is sensitive to sleep disturbance (Bouhuys et al., 1990). Importantly, because the same neural substrates control both characteristics of vocal expression and cardiovascular responses to stress, it is likely that physiological stress reactivity is encoded in the physical properties of speech (Benarroch, 2012). Indeed, recent research has linked aspects of f0 to cortisol responses to family conflict (Baucom et al., 2012a) and to heart rate, blood pressure, and cortisol responses to conflict discussion in couples (Weusthoff et al., 2013). Finally, a growing body of literature supports the potential of f0 as a measure of stress across the lifespan including studies of pre-verbal infants (e.g., Porter et al., 1986), adolescents (e.g., Baucom et al., 2012a), and young, middle aged, and older adults (e.g., Baucom et al., 2015a). In summary, vocal acoustic assessment is a promising method of psychosocial stress measurement that could be implemented on a large scale using existing technology.


Recent models of stress regulation suggest that psychosocial stress may be most comprehensively conceptualized as a set of component processes—stress exposure, stress reactivity, stress recovery, restoration (Hawkley & Cacioppo, 2003; Uchino, Smith, Holt-Lunstead, Campo & Reblin, 2007). An individual differences extension of this framework posits that these stress processes are moderated by key phenotypic and endophenotypic individual difference factors (Williams, Smith, Gunn, & Uchino, 2011; Williams, Suchy, & Rau, 2009). This organizing framework can be utilized in the comprehensive examination of f0 associations with stress, including other endophenotype-level individual differences (e.g., resting high-frequency heart rate variability [HF-HRV] or respiratory sinus arrhythmia [RSA]). The current study has three primary goals: 1) to examine methods of assessing f0 in relation to stress component processes and other endphenotype moderators, and 2) to examine the possibility that combining f0 with additional vocal acoustic characteristics commonly used in affective computing applications, such as intensity, speech rate, HF-500, Mel Frequency Cepstral Coefficients, and Mel Filter Banks, will allow for a more robust means of assessing stress conveyed in the voice, and 3) to use the results of Aims 1 and 2 to inform a pilot study to validate specific assessment methods.

Specific Aims

Aims 1 and 2: To examine the association between summary indices that characterize f0 and standard measures of stress component processes. Multiple high quality data sets will be examined to determine which indices of f0 (e.g., mean baseline, range, change in response to stress/distress) are reliably associated with standard measures of stress exposure (e.g., daily hassles, recent life events, childhood trauma), stress reactivity (psychophysiological; affective), stress recovery (post-stressor physiological; end-of-day arousal), and restoration (sleep duration and quality).

Aim 3: To validate specific assessment methods for eliciting stress-related vocal acoustics.


Consistent with previous research (e.g., Busso et al., 2009), we hypothesize that f0 range will demonstrate the largest magnitude and most consistent associations with stress component processes relative to indices of f0. In addition, we hypothesize that a composite acoustic index composed of f0 range and the additional acoustic variables mentioned above will demonstrate a significantly larger association with stress component processes than that for indices of f0 by themselves.


Aims 1 & 2: Across data sets, audio recordings are available during a variety of standard stress paradigms, including Trier Social Stressor, Social Competence Interview, Couples’ Conflict Discussion, and Life History Calendar. All studies have resting psychophysiology and most have reactivity and recovery (heart rate, blood pressure, heart rate variability, impedence cardiography, cortisol, as well as affect ratings). Several studies have behavioral coding. Most have additional stress measures (past events, global perceived stress), as well as cognitive testing, sleep assessments, well-being, and mental health. Geneva Minimalistic Acoustic Parameter Set (GeMAPS) for Voice Research and Affective Computing will be used for analysis of audio samples.

Aim 3: Pilot testing will focus on the utility of a phone-based audio assessment using the Stroop paradigm. 50 community participants will complete 14 days of experience sampling data collection followed by a laboratory assessment session. Experience sampling will include daily ratings of stressors, actigraphic sleep assessment and sleep diaries. The laboratory assessment will include resting impedance cardiography and EEG assessment, as well as neuropsychological (executive functioning) assessment.


Associations between vocal acoustic characteristics during a well-validated stressor recall task (Social Competence Interview; SCI) and corresponding physiological and affective stress responses have been examined with promising results. Average R2 across stress reactivity measures was .32 (SD = .10), with robust associations for negative affect reactivity (Model R2 = .45 and associations with 14 vocal indices) and systolic blood pressure reactivity (R2 = .33 and associations with 9 vocal indices). These results were replicated and extended in a second data set examining stress reactivity during couple conflict. For example, the set of 87 acoustic variables was able to correctly classify the top and bottom deciles of RSA and HR reactivity for 70% and 83% of samples respectively. Finally, in both data sets and consistent with hypotheses, vocal features other than fundamental frequency were significant predictors of stress component process with loudness being the most robust of the additional predictors.

Conclusion and Future Directions

Initial findings support the hypothesis that affective and physiological stress reactivity are encoded in the physical properties of speech, suggesting that vocal acoustic characteristics are a promising method of objective stress measurement. Future research will seek to replicate these findings across other existing data sets with a variety of stress induction paradigms. In addition, key individual difference moderators including personality, resting high frequency-heart rate variability, and executive functioning will be examined. In the interest of facilitating future large-scale, mobile stress measurement, pilot testing will focus on a brief, phone-based Stroop assessment. Pilot testing will extend findings from laboratory protocols (Aims 1 & 2) to examine circadian patterns and association with corresponding daily stress experience and sleep patterns, reports of past life events and chronic stress, as well as laboratory-assessed resting physiology and cognitive functioning.

How can other researchers use these findings to inform their own work exploring the concept of ‘stress’?

Initial findings suggest that adding audio recordings of vocal responses to stress assessment may be fruitful. Examination of vocal indices may be a lower cost, efficient method of capturing affective and physiological reactivity. Future research will determine the extent to which these physical properties of speech evidence significant associations with stress recovery and restoration (e.g., sleep), as well as important individual differences known to moderate self-regulation, broadly speaking.

How will this experience alter the way in which you approach studying and measuring ‘stress’?

The research team is particularly excited about the promise of phone-based assessment that may capture stress regulation outside of standard laboratory assessments. Once validated, this type of assessment can be implemented in large-scale, population-based studies of physical and mental health.

July 2017

bottom of page