A large naturalistic speech database with emotional traces

The MSP-Conversation corpus contains interactions annotated with time-continuous emotional traces for arousal (calm to active), valence (negative to positive), and dominance (weak to strong). Time-continuous annotations offer the flexibility to explore emotional displays at different temporal resolutions while leveraging contextual information. Release 1.0 contains 74 conversations with duration between 10-20 minutes (more than 15 hours). The conversations have been annotated by at least five workers. This is an ongoing effort, where our plan is to increase the size of the corpus. We have already identified 52 new conversations that we have started to annotate for the second release (28hrs 15min in total).

Spontaneous speech emotional data with emotional traces

A key feature of the corpus is that the recordings overlap with the recordings included in the MSP-Podcast database, which contains sentence-level annotations of short segments retrieved from podcasts. The MSP-Podcast corpus is not appropriate to study contextual information, as the isolated turns are separately evaluated, missing the temporal relationship between consecutive speaking turns. The MSP-Conversation corpus complements the MSP-Podcast, providing the perfect platform to explore temporal information.

The proposed approach has the advantage that we can easily balance the emotional content and speaker demography by choosing the right podcasts. The approach does not intentionally manipulate or induce the speakers, resulting in a flexible and scalable approach to collect emotional data. The MSP-Podcast corpus is being recorded as part of our NSF project "CCRI: New: Creating the largest speech emotional database by leveraging existing naturalistic recordings" (NSF CNS: 2016719). For further information on the corpus, please read:

We plan to share this corpus with the research community in the future.

