Boosting Road Event Detection with Adaptive Multi-Modal Models


Linkai Liu Xiaoyan Xiao Yijian Yang Yuchen Zhou Zipeng Guo Chao Gou
School of Intelligent Systems Engineering, Sun Yat-Sen University
ICME 2025

Abstract

Despite significant advancements in road event detection (RED), existing approaches encounter critical limitations. These include reliance on single-modal inputs and joint optimization of detection and classification tasks, often leading to conflicting objectives and suboptimal performance. Moreover, their heavy dependence on large-scale annotated datasets restricts generalization in data-scarce scenarios. To address these challenges, we propose AdaRED, a novel framework that decouples agent detection from event classification, thereby mitigating optimization conflicts and enhancing task-specific performance. To comprehensively understand road events, AdaRED uses diverse input modalities, including fine-grained local features, global contextual information, and spatial layout embeddings. Additionally, we introduce the Cross-modal Scene Adaptation Module (CSAM), which integrates lightweight adapters into the multimodal model. This design enables efficient extraction of spatiotemporal features and the integration of visual priors, thereby improving generalization and robustness in challenging scenarios. Extensive experiments on the ROAD-R dataset validate the effectiveness of AdaRED, achieving state-of-the-art performance and addressing the limitations of existing methods. The code and video examples can be found on our project page: https://liulinkai.github.io/AdaRED/.

Pipeline

The proposed AdaRED consists of two stages: first, the input video is processed to extract global context, local motion features, and spatial embeddings, which are fused into comprehensive representations; second, the CSAM leverages visual priors from the multi-modal model CLIP and attention mechanisms to process these fused features, capturing cross-modal relationships and enabling precise recognition of complex behaviors such as "a car moving toward the incoming lane."