Vision-based human alertness and attention state modeling in driving scenarios
Autor:
Directores: Marcos Nieto Doncel (Vicomtech) Luis Salgado Álvarez de Sotomayor (Universidad)
Universidad: Universidad Politécnica de Madrid
Fecha: 17.12.2024
Humans are constantly striving to improve their transportation resources. The invention of the combustion engine car is considered a milestone in the evolution of modern transportation, as it has reduced distances and facilitated connections between people all around the world. However, engineers and researchers are continually pushing forward the optimization of transportation and vehicles. In this context, the development of advanced driving assistance systems (ADAS) has provided vehicles with additional functions that improve both the security and comfort of vehicles. In fact, the automobile industry has embraced the task of developing increasingly autonomous vehicles, which relieve the human driver of some of the tasks required when driving a car. This has been manifested by the definition of standardized levels for vehicle autonomy (SAE Autonomous Levels).
The evolution of vehicles towards technologically advanced means of transportation requires the inclusion of driver monitoring systems (DMS), which can assess the driver's state. Notably, the inclusion of DMS is a key element in EuroNCAP vehicle safety evaluation and Europe safety regulations, and it is expected that most vehicles will be equipped with different types of DMS. Research on driver monitoring (DM) is crucial to find the adequate methodology to implement these systems in real cars. The goal of DMS is to assess the driver's capabilities to perform driving tasks, so observing the driver is necessary. Unobtrusive methods to assess the driver's state seem to be the most suitable approaches to building DMS. Among these, vision-based systems stand out as methods for observing the driver using camera sensors, as they offer the ability to infer visible human state impairments.
Human state physiognomy is complex and depends on the individual characteristics of each person. However, some common phenomena could be identified, which allow the classification of signs of driver state impairment. The evaluation of driver capacities is usually performed by DM, which can be degraded by episodes of inattention, putting their safety and that of others in the car at risk. This Ph.D. thesis analyzes the most relevant DMS requirements based on human physiognomy signs of inattention to enable the development of unobtrusive camera-based components for DMS, such as DM datasets. The work done in this thesis aims to provide researchers and industrial practitioners with an overview of driver state characteristics that are useful in building DMS components and tools to develop such systems.
Camera-based methods for assessing the driver's state are based on computer vision (CV) and machine learning (ML) technologies. Nowadays, the results of deep learning (DL) have proven to be very useful in many image analysis problems. However, these methods require large datasets to perform their design task well. Therefore, this thesis presents a methodology for building DM datasets that take into consideration the analysis of DMS requirements. To apply the presented methodology, the DMD is built and made available to the research community. The DMD is a multi-modal, multi-camera dataset aimed at covering the most relevant characteristics of DM regarding fatigue and distraction states. Moreover, the DMD is currently one of the most diverse datasets related to DM.
Additionally, the analysis of DMS requirements has led to the development of a real-time software framework based on DMS characteristics. This thesis elaborates on the classification of the typology of DM algorithms and proposes a modular framework for developing complex yet optimized real-time DMS architectures. The DMSLib framework is based on the definition of abstract processing units called Analyzers, which perform specific tasks. These units can be interconnected with each other, sharing common information and algorithm analytics results by using defined interfaces. By connecting different components, it is possible to generate complex DMS pipelines that fulfill any monitoring requirements. A key benefit of the proposed framework is the optimization of DMS pipelines by re-utilization of low-level components' results reducing the time to market efforts for OEMs and Tier 1.
The resulting DMD dataset and proposed DMSLib framework have been used to develop and test different DM vision-based methods, which consolidate the contribution of this thesis. First, this thesis shows the application of the DMSLib and the DMD in DM use cases, such as eyelid aperture estimation, driver gaze estimation and driver action recognition. Second, a novel method to estimate the driver's eyelid aperture is presented. This method is relevant since it is based on the capacity to adapt to the human physiognomy of eye shape, which could affect the driver's fatigue state. The method was tested on the DMD dataset and other publicly available datasets, and implementation was done using the DMSLib framework. Finally, as proof of industrial viability, some DM functions have been integrated into real vehicle systems to validate the integration with autonomous driving (AD) functions and communications (IoT) capabilities.