ICMR 2020: Call for special session papers

The world surrounding us involves multi-modality data – we see objects, hear sounds, feel textures, and so on. With the development of digital devices and social networks, the data is growing rapidly. We urgently need advanced techniques on the understanding, translation, and retrieval of the multi-modality data. The human-centric cross-modal retrieval is mainly motivated by three aspects: 1) humans are the main understanding targets, since the retrieval of human face, appearance, action, behavior is usually a major function of multimedia systems; 2) the evaluation of the retrieval results should essentially rely on human intelligence rather than just machine intelligence, since retrieval models are usually designed by humans and the final decisions would be made by humans; 3) we consider retrieval important not only because of its efficiency and convenience, but also because we use this technology to improve the human life. Hence, we believe that human-centric is the key value of cross-modal retrieval tasks. Maintaining the “human element” will greatly boost the cross-modal retrieval. Human-centric cross-modal retrieval starts to attract increasing research efforts from both academic and industrial research communities.

Human-centric cross-modal retrieval aims to take one arbitrary type of data as the query to retrieve relevant data of another type, where 1) the retrieval target is focused on “human”, 2) the system collaborates with human intelligence, or 3) the results benefit human life. The challenges of the task mainly lie in the heterogeneity gap which focuses on measuring the content similarity between different modalities of data, the relations of human intelligence and machine intelligence and the ambiguous understanding among different persons. Recently, great advancements in machine learning and artificial intelligence with deep neural networks have made human-centric cross-modal retrieval possible.

This ICMR’20 special session aims to gather high-quality contributions reporting the most recent progress on human-centric cross-modal retrieval for multimedia applications. It targets a mixed audience of researchers and technologists from several communities, i.e., multimedia, machine learning, computer vision, artificial intelligence, etc. The topics of interest include, but are not limited to: