This website summarizes the research activities at the
MSP lab. Please refer to the publication section for further details.
Our long-term goal is to understand, under a multimodal signal processing view, inter-personal expressive human communication and interaction. In addition to spoken language, non-verbal cues such as hand gestures, facial expressions, rigid head motions, and speech prosody (e.g., pitch and energy) play a crucial role to express feeling, to give feedback and to display human affective states or emotions. Development of multimedia speech communication interfaces that are socially aware, in both recognizing and responding to the human users, hence has to incorporate the rich fabric of verbal and nonverbal cues, expressed both auditorily and visually. This is the precise goal of our research. Such core enabling multimedia technology can have impact in a wide range of applications including games, simulation systems and other education and entertainment systems.
There are four main research areas of interest:
The verbal and non-verbal channels of human communication are internally and intricately connected. As a result, gestures and speech present high levels of correlation and coordination. This relationship is greatly affected by the linguistic and emotional content of the message being communicated. As a result, the emotional modulation observed in different communicative channel is not uniformly distributed. In the face, the upper region presents more degrees of freedom to convey non-verbal information than in the lower region, which is highly constrained by the underlying articulatory processes. The same type of interplay is observed in various aspects of speech. In the spectral domain, some broad phonetic classes, such as front vowels, have stronger emotional variability than other phonetic classes (e.g., nasal sounds), suggesting that for some phonemes the vocal tract features do not have enough degrees of freedom to convey the affective goals. Likewise, gross statistics from the fundamental frequency such as the mean, maximum, minimum and range are more emotionally prominent than the features describing the F0 shape, which are hypostasized to be closely related with the lexical content. Interestingly, a joint analysis reveals that when one modality is constrained by the articulatory speech process, other channels with more degrees of freedom are used to convey the emotions. For example, facial expression and prosodic speech tend to have a stronger emotional modulation when the vocal tract is physically constrained by the articulation to convey other linguistic communicative goals.
As a result of the analysis and modeling, we present applications in recognition and synthesis of expressive communication.
Even a small distraction in drivers can lead to life-threatening accidents that affect the life of many. Monitoring distraction is a key aspect of any feedback system intended to keep the driver attention. Toward this goal, we are interested in modeling the behaviors observed when the driver is performing in-vehicle common secondary tasks such as operating a cellphone, radio or navigation system. The study employs the UTDrive platform - a car equipped with multiple sensors, including cameras, microphones, and Controller Area Network-Bus (CAN-Bus) information.
Emotion plays a crucial role in day-to-day interpersonal human interactions. Recent findings have suggested that emotion is integral to our rational and intelligent decisions. It helps us to relate with each other by expressing our feelings and providing feedback. This important aspect of human interaction needs to be considered in the design of human-machine interfaces (HMIs). To build interfaces that are more in tune with the users' needs and preferences, it is essential to study how emotion modulates and enhances the verbal and nonverbal channels in human communication.