UTD Home UTD Home

MSP-Improv corpus:

An emotional audiovisual database of spontaneous improvisations

The MSP-Improv is an acted audiovisual emotional database that explores emotional behaviors during spontaneous dyadic improvisations. The scenarios are carefully designed to elicit realistic emotions. Currently, the corpus comprises data from six dyad sessions (12 actors). The participants are UTD students from the School of Arts and Humanities, who have taken classes in Theatre and Drama and have acting experience.

Dyadic Setting

High Quality Recordings

The MSP-Improv corpus was recorded as part of our study on audiovisual emotion perception using data-driven computational modeling (NSF IIS: 1217104). The project involves creating stimulus with conflicting emotional content conveyed through speech and facial expression (e.g., happy speech, angry facial expression). The recombination process central to the creation of the stimuli requires the use of semantically controlled and acted audio-visual utterances. In this paradigm the same lexical content must be expressed across clips to ensure that the recombination is as natural and artifact-free as possible. This restriction necessitates the use of acted environments for the collection of standardized lexical content over multiple emotions.

We used a novel recording paradigm to achieve emotional expressions that approach the naturalness found in unsolicited human speech. We designed 20 target sentences with various lengths. For each of these sentences, we created scenarios that triggered emotional reactions (happy, sadness, anger and neutral state). The scenarios are carefully selected such that the actor can embed the target emotion with the improvisation. Thus, we capitalize on the emotional context provided by the stories while maintaining the fixed lexical content required by our experimental framework.

In addition to the target sentences, we are considering all the turns during the improvisation recordings, not just the target sentences. In additions, we are collecting the actors' interaction between recordings (natural interactions). We collected 8,438 speaking turns, out of which 652 turns correspond to the target sentences.

The details of the corpus are described in Busso et al. (2017):

  1. Carlos Busso, Srinivas Parthasarathy, Alec Burmania, Mohammed AbdelWahab, Najmeh Sadoughi, and Emily Mower Provost, "MSP-IMPROV: An acted corpus of dyadic interactions to study emotion perception," IEEE Transactions on Affective Computing, vol. 8, no. 1, pp. 119-130 January-March 2017. [pdf] [cited] [bib]

Release of the Corpus

The corpus is now available under an Academic License. Please download this pdf. The form need to be signed by the director of the research group. Send the signed form to Prof. Carlos Busso -

Some of our Publications using this Corpus:

  1. Carlos Busso, Srinivas Parthasarathy, Alec Burmania, Mohammed AbdelWahab, Najmeh Sadoughi, and Emily Mower Provost, "MSP-IMPROV: An acted corpus of dyadic interactions to study emotion perception," IEEE Transactions on Affective Computing, vol. 8, no. 1, pp. 119-130 January-March 2017. [pdf] [cited] [bib]
  2. Reza Lotfian and Carlos Busso, "Over-sampling emotional speech data based on subjective evaluations provided by multiple individuals," IEEE Transactions on Affective Computing, vol. To appear, 2019. [soon cited] [pdf] [bib]
  3. Mohammed Abdelwahab and Carlos Busso, "Domain adversarial for acoustic emotion recognition," IEEE/ACM Transactions on Audio, Speech, and Language Processing, vol. 26, no. 12, pp. 2423-2435, December 2018. [pdf] [cited] [ArXiv] [bib]
  4. Emily Mower Provost, Yuan Shangguan, and Carlos Busso, "UMEME: University of Michigan emotional McGurk effect data set," IEEE Transactions on Affective Computing, vol. 6, no. 4, pp. 395-409, October-December 2015. [pdf] [cited] [bib]
  5. Reza Lotfian and Carlos Busso, "Retrieving categorical emotions using a probabilistic framework to define preference learning samples," in Interspeech 2016, San Francisco, CA, USA, September 2016, pp. 490-494. [pdf] [cited] [bib] [slides]
  6. Mohammed Abdelwahab and Carlos Busso, "Incremental adaptation using active learning for acoustic emotion recognition," in IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP 2017), New Orleans, LA, USA, March 2017, pp. 5160-5164. [pdf] [cited] [bib] [poster]
  7. Alec Burmania, Mohammed Abdelwahab, and Carlos Busso, "Tradeoff between quality and quantity of emotional annotations to characterize expressive behaviors," in IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP 2016), Shanghai, China, March 2016, pp. 5190-5194. [pdf] [cited] [bib] [slides]
  8. Reza Lotfian and Carlos Busso, "Building naturalistic emotionally balanced speech corpus by retrieving emotional speech from existing podcast recordings," IEEE Transactions on Affective Computing, vol. To appear, 2018. [pdf] [cited] [bib]
  9. Mohammed Abdelwahab and Carlos Busso, "Ensemble feature selection for domain adaptation in speech emotion recognition," in IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP 2017), New Orleans, LA, USA, March 2017, pp. 5000-5004. [pdf] [cited] [bib] [slides]
  10. Srinivas Parthasarathy and Reza Lotfian and Carlos Busso, "Ranking emotional attributes with deep neural networks," in IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP 2017), New Orleans, LA, USA, March 2017, pp. 4995-4999. [pdf] [cited] [bib] [slides]
  11. Alec Burmania and Carlos Busso, "A stepwise analysis of aggregated crowdsourced labels describing multimodal emotional behaviors," in in Interspeech 2017, Stockholm, Sweden, August 2017, pp. 152-157. [soon cited] [pdf] [bib] [slides]
  12. Alec Burmania, Srinivas Parthasarathy, and Carlos Busso, "Increasing the reliability of crowdsourcing evaluations using online quality assessment," IEEE Transactions on Affective Computing, vol. 7, no. 4, pp. 374-388, October-December 2016. [pdf] [cited] [bib]

This material is based upon work supported by the National Science Foundation under Grant IIS-1217104. Any opinions, findings, and conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of the National Science Foundation.


Copyright Notice: This material is presented to ensure timely dissemination of scholarly and technical work. Copyright and all rights therein are retained by authors or by other copyright holders. All persons copying this information are expected to adhere to the terms and constraints invoked by each author's copyright. In most cases, these works may not be reposted without the explicit permission of the copyright holder.

(c) Copyrights. All rights reserved.