UTD Home UTD Home
Human-Centered Research Lab

Welcome to the Multimodal Signal Processing (MSP) Laboratory

In this website, you will find an overview of the exciting activities that are happening at Multimodal Signal Processing (MSP) laboratory. It also introduces the faculty and students involved in the lab.


We developed an audiovisual LVASR system that works better or equal than an audio-based ASR, even when the visual features are not very discriminative. [pdf]
We created a probability gaze region by upsampling CNNs, starting from the head pose of the driver. [pdf]
An elegant approach to derive relative labels from sentence level annotations. QA-based labels provides reliable information to train preference learning models. [pdf]
We built a gaze detection algorithm that does not need calibration or cooperation from the user. Learn more in our paper. [pdf]
We noticed that higher regularization is needed to detect valence. Here is our study to understand the underlying reasons. [pdf]
Lip motion is not perfectly synchronized with speech (e.g., anticipatory movements). We compensate for this phase difference with our new AliNN framework. [pdf]
We do not have a big database for speech emotion recognition, until now! Learn more about the MSP-Podcast database in our paper [pdf]
This study increases the temporal modeling for audiovisual speech activity detection [pdf]
Our solution to record head pose data in real driving recordings. Read more in our paper [pdf]
Can we train deep models for speech emotion recognition? Read our experience in our paper [pdf]
If we know that the data is emotional, can we say anything about the reliability of an speaker ID systems? read more in our paper [pdf]
We use lip motion features to improve our supervised audiovisual speech activity detection. Read the details on our paper [pdf]
Do you believe that disagrement between evaluators is noise? We believe it is valuable information for speech emotion recognition. Read more in our paper [pdf]
Where is the driving looking now? We ask this question in our paper using a probabilistic framework. Read more in our paper [pdf]

About the MSP laboratory

The MSP laboratory is dedicated to advance technology in the area human-centered multimodal signal processing. We are looking at theoretical problems with practical applications. Our goal is to develop methods, algorithms and models to recognize and synthesize human verbal and non-verbal communication behaviors to improve human machine interaction.

Our current research includes:

  • Affective computing
  • Speech, video and multimodal processing
  • Multimodal human-machine interfaces
  • Analysis and modeling of verbal and non-verbal interaction
  • Human interaction analysis and modeling
  • Multimodal speaker identification
  • Meeting analysis and intelligent meeting spaces
  • Machine learning methods for multimodal processing

The MSP lab was established by Prof. Carlos Busso in August 2009. He is also the director of the group.

The MSP lab is part of the Erik Jonsson School of Engineering and Computer Science at The University of Texas at Dallas .

(c) Copyrights. All rights reserved.