[ACM MM'25] HUD: Hierarchical Uncertainty-Aware Disambiguation Network for Composed Video Retrieval

1School of Software, Shandong University,
2School of Computer Science and Technology, Harbin Institute of Technology (Shenzhen)

*Corresponding author.

Abstract

MY ALT TEXT

CVR Retrieval Paradigm, Modification Referring Ambiguity, and Detailed Semantic Focus

MY ALT TEXT

Illustrations of (a) an example of the CVR Retrieval Paradigm, the issue of (b) Modification Referring Ambiguity, and (c) Detailed Semantic Focus.


Ratios of modification texts containing ambiguous pronouns

MY ALT TEXT

Ratios of modification texts containing ambiguous pronouns on WebVid-CoVR and EgoCVR datasets.


Framework: Hierarchical Uncertainty-aware Disambiguation network (HUD)

MY ALT TEXT

Overall framework of HUD consists of (a) Holistic Pronoun Disambiguation, (b) Atomistic Uncertainty Modeling, and (c) Holistic-to-Atomistic Alignment.


Experiment

MY ALT TEXT

Performance comparison on the test set of the CVR dataset, WebVid-CoVR. The overall best results are in bold, while the best results over baselines are underlined.


MY ALT TEXT

Performance comparison on the CIR dataset, FashionIQ and CIRR, relative to R@K(%). The overall best results are in bold, while the best results over baselines are underlined.


MY ALT TEXT

Ablation study on the CVR datasets, WebVid-CoVR, and CIR datasets, FashionIQ and CIRR. Δ denotes the performance drop of the compared derivatives and is marked with the green background.


MY ALT TEXT

Sensitivity to Probabilistic Sample Number 𝑈 on (a) CVR and (b) CIR task.


MY ALT TEXT

Case study on (a) WebVid-CoVR and (b) CIRR.

BibTeX


@inproceedings{hud,
title={HUD: Hierarchical Uncertainty-Aware Disambiguation Network
for Composed Video Retrieval},
author={Chen, Zhiwei and Hu, Yupeng and Li, Zixu and Fu, Zhiheng and Wen, Haokun and Guan, Weili},
booktitle={Proceedings of the 33th ACM International Conference on Multimedia},
year={2025}
}