We collected the MDM dataset to study driver attention and facilitate the design of models for head pose and gaze estimation. Multimodal sensors are used to collect this database, while drivers perform various tasks.
We present the multimodal driver monitoring (MDM) database, which is specifically designed to study the driver's visual attention. It addresses some of the challenges of existing driver monitoring datasets, including their limited size and narrow scope focusing solely on either the driver's head pose or the driver's gaze zones. The MDM corpus is a new multimodal dataset that considers both head pose and gaze direction to facilitate learning this complex interplay between the head pose, eye movements and the resulting gaze. We achieve this goal by providing frame-based information about head pose, and spatial positions for target markers inside and outside the vehicle that were glanced at by the drivers. We have collected 59 different gender-balanced subjects spanning different ethnic groups.
The corpus is multimodal, where we use multiple RGB cameras and depth videos to record the drivers.
Our solution for estimating the frame-based labels for the head pose is based on fiducial markers placed on a solid helmet with a predefined structure with highly contrasting black and white patterns. The position and orientation of multiple fiducial points are easily tracked with simple algorithms, which are then used to estimate the head pose of the drivers.
We use four GoPro Hero 6 Black cameras in our data collection. All the cameras are set to record in full HD (1,920 x 1,080) in linear mode. The purpose and placement of each of the four cameras are listed as follows:
We place a camboard picoflex camera close to the face camera. This camera records the point cloud data using time-of-flight technology, providing robust estimation of the depth map in varying illumination conditions. This camera provides reliable information even during night-time. The camera also records grayscale images which are illuminated using IR LEDs.
The UTDrive vehicle records the CAN-Bus information during the recordings. From the CAN-Bus, we obtain the information about accelerator, brakes, steering and speed of the vehicle. The UTDrive vehicle also has a gas and brake pressure sensors.
The UTDrive vehicle is also equipped with a microphone array with five microphones. While our protocol does not include any task that elicits speech, the audio information is useful for understanding potential auditory distractions.
The CAN-Bus data and the microphone array are connected to a Dewetron system, which stores and synchronizes the modalities.
We ask the drivers to perform multiple tasks while being at the driver seat in the UTDrive car.
The MSP-Podcast corpus is being recorded as part of our NSF project "CCRI: New: Creating the largest speech emotional database by leveraging existing naturalistic recordings" (NSF CNS: 2016719). For further information on the corpus, please read:
The multimodal driver monitoring dataset is licensed free of cost to academic institutions under a Federal Demonstration Partnership (FDP) Data Transfer and Use Agreement. We have also established a licensing program for commercial entities interested in our corpus. We will provide the details soon. If you have more information, plese contact Prof. Carlos Busso -
This work was supported by SRC / Texas Analog Center of Excellence (TxACE), under task 2810.014. Any opinions, findings, and conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of the sponsor agencies.
Copyright Notice: This material is presented to ensure timely dissemination of scholarly and technical work. Copyright and all rights therein are retained by authors or by other copyright holders. All persons copying this information are expected to adhere to the terms and constraints invoked by each author's copyright. In most cases, these works may not be reposted without the explicit permission of the copyright holder.