MSP-Improv corpus:
An emotional audiovisual database of spontaneous improvisations
The MSP-Improv is an acted audiovisual emotional database that explores emotional behaviors
during spontaneous dyadic improvisations. The scenarios are carefully designed to elicit realistic
emotions. Currently, the corpus comprises data from six dyad sessions (12 actors). The participants
are UTD students from the School of Arts and Humanities, who have taken classes in Theatre and Drama
and have acting experience.
The MSP-Improv corpus was recorded as part of our study on audiovisual emotion
perception using data-driven computational modeling (NSF IIS: 1217104). The project involves
creating stimulus with conflicting emotional content conveyed through speech and facial
expression (e.g., happy speech, angry facial expression). The recombination process central
to the creation of the stimuli requires the use of semantically controlled and acted audio-visual
utterances. In this paradigm the same lexical content must be expressed across clips to ensure
that the recombination is as natural and artifact-free as possible. This restriction
necessitates the use of acted environments for the collection of standardized lexical
content over multiple emotions.
We used a novel recording paradigm to achieve emotional expressions that approach
the naturalness found in unsolicited human speech. We designed 20 target sentences with
various lengths. For each of these sentences, we created scenarios that triggered emotional
reactions (happy, sadness, anger and neutral state). The scenarios are carefully selected such
that the actor can embed the target emotion with the improvisation. Thus, we capitalize on the
emotional context provided by the stories while maintaining the fixed lexical content required
by our experimental framework.
In addition to the target sentences, we are considering all the turns during the improvisation
recordings, not just the target sentences. In additions, we are collecting the actors' interaction
between recordings (natural interactions). We collected 8,438 speaking turns, out of
which 652 turns correspond to the target sentences.
The details of the corpus are described in Busso et al. (2017):
- Carlos Busso, Srinivas Parthasarathy, Alec Burmania, Mohammed AbdelWahab, Najmeh Sadoughi,
and Emily Mower Provost, "MSP-IMPROV: An acted corpus of dyadic
interactions to study emotion perception,"
IEEE Transactions on Affective Computing, vol. 8, no. 1, pp. 119-130 January-March 2017.
Release of the Corpus: Academic License
The corpus is now available under an Academic License. Please download this pdf. The
form need to be signed by the director of the research group. Send the signed form to Prof. Carlos Busso - 
- Please copy the group leader or laboratory directory in your email.
- Add the group leader or laboratory directory to the list at the end of the agreement. Add full name, signature and title.
- Use your institution email to contact us.
Release of the Corpus: Commercial License
Companies interested in this corpus can obtain a commercial license from UT Dallas. The cost of the license is US$8,000. Please contact Carlos Busso if you are interested. - 
Some of our Publications using this Corpus:
- Ali N. Salman and Carlos Busso, "Style
extractor for facial expression recognition in the presence of speech,"
in IEEE International Conference on Image Processing (ICIP 2020), Abu Dhabi, United Arab Emirates (UAE), October 2020.
- Ali N. Salman and Carlos Busso, "Dynamic versus static facial
expressions in the presence of speech,"
in IEEE International Conference on Automatic Face and Gesture Recognition (FG 2020), Buenos Aires, Argentina, May 2020.
- Reza Lotfian and Carlos Busso, "Over-sampling emotional
speech data based on subjective evaluations provided by multiple individuals,"
IEEE Transactions on Affective Computing, vol. To appear, 2020.
- Mohammed Abdelwahab and Carlos Busso, "Domain adversarial
for acoustic emotion recognition,"
IEEE/ACM Transactions on Audio, Speech, and Language Processing, vol. 26, no. 12, pp. 2423-2435, December 2018.
- Emily Mower Provost, Yuan Shangguan, and Carlos Busso, "UMEME: University of Michigan
emotional McGurk effect data set,"
IEEE Transactions on Affective Computing, vol. 6, no. 4, pp. 395-409, October-December 2015.
- Reza Lotfian and Carlos Busso, "Retrieving categorical emotions using a
probabilistic framework to define preference learning samples,"
in Interspeech 2016, San Francisco, CA, USA, September 2016, pp. 490-494.
- Mohammed Abdelwahab and Carlos Busso, "Incremental adaptation
using active learning for acoustic emotion recognition,"
in IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP 2017),
New Orleans, LA, USA, March 2017, pp. 5160-5164.
- Alec Burmania, Mohammed Abdelwahab, and Carlos Busso, "Tradeoff between quality and quantity of
emotional annotations to characterize expressive behaviors,"
in IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP 2016),
Shanghai, China, March 2016, pp. 5190-5194.
- Reza Lotfian and Carlos Busso, "Building naturalistic emotionally
balanced speech corpus by retrieving emotional speech from existing podcast recordings,"
IEEE Transactions on Affective Computing, vol. To appear, 2018.
- Mohammed Abdelwahab and Carlos Busso, "Ensemble feature selection for domain
adaptation in speech emotion recognition,"
in IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP 2017),
New Orleans, LA, USA, March 2017, pp. 5000-5004.
- Srinivas Parthasarathy and Reza Lotfian and Carlos Busso, "Ranking
emotional attributes with deep neural networks,"
in IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP 2017),
New Orleans, LA, USA, March 2017, pp. 4995-4999.
- Alec Burmania, Srinivas Parthasarathy, and Carlos Busso,
"Increasing the reliability of crowdsourcing evaluations using
online quality assessment,"
IEEE Transactions on Affective Computing, vol. 7, no. 4, pp. 374-388, October-December 2016.
This material is based upon work supported by the National Science
Foundation under Grant IIS-1217104. Any opinions, findings, and conclusions or recommendations expressed in
this material are those of the author(s) and do not necessarily reflect the views of the
National Science Foundation.
