ACM Multimedia 2022 Tutorial on

Multimedia Content Understanding in Harsh Environments


With the rapid development of multimedia technology, a large amount of multimedia data has rapidly emerged. The large amount of multimedia data facilitates new and innovative approaches, for example, multimedia content understanding. Multimedia content understanding is a key application for effective and efficient search, retrieval, delivery, management and sharing of multimedia content. Existing work shows that media understanding performs well in excellent environments, i.e., in good light, good weather, and with sufficient training samples. Harsh environments (e.g., fog, rain, snow, dark, low light, glare, blur, and low resolution) introduce challenges in visibility, analysis and understanding of visual data for real applications, such as autonomous cars and video surveillance systems. Despite the development of computing power and deep learning algorithms, the performance of current multimedia content understanding algorithms is still mainly benchmarked under high-quality environments (good weather, favorable lighting). Therefore, state-of-the-art methods often encounter a severe performance degradation under harsh environments.

In this tutorial, we introduce some key directions in the field of multimedia content understanding under harsh environments. This tutorial would be useful for the multimedia community, especially for multimedia content understanding task for the practical and open-set domain.

This tutorial covers several important components of multimedia content understanding in harsh environments. First, we will introduce some multimedia enhancement methods, including image deraining, dehazing and low-light enhancement, and demonstrate their performances in down-stream vision tasks, such as object detection and segmentation. Second, we will present recent advances on 2D and 3D visual scene understanding, and describe how deep learning and visual big data are significantly driving research and development in this domain. Third, we will introduce strategies to estimate the prediction uncertainty during training to rectify the pseudo label learning for unsupervised semantic segmentation adaptation. Finally, we will give a brief summary and show some typical applications and some trends in this task.


  • An opening of the tutorial
  • Image enhancement: Disentanglement - Kui Jiang slides
  • 2D and 3D Scene Understanding - Dan Xu slides
  • Domain Adaptation: Consistency and Uncertainty - Zedong Zheng slides
  • Understanding and Detection in Harsh Environments - Zheng Wang slides
  • Organizers

    Dan Xu


    Kui Jiang