[ACM MM'25] HUD: Hierarchical Uncertainty-Aware Disambiguation Network for Composed Video Retrieval

Zhiwei Chen¹, Yupeng Hu^1*, Zixu Li¹, Zhiheng Fu¹, Haokun Wen², Weili Guan²,

¹School of Software, Shandong University,
²School of Computer Science and Technology, Harbin Institute of Technology (Shenzhen)
^*Corresponding author.

Paper Code Coming Soon

CVR Retrieval Paradigm, Modification Referring Ambiguity, and Detailed Semantic Focus

Illustrations of (a) an example of the CVR Retrieval Paradigm, the issue of (b) Modification Referring Ambiguity, and (c) Detailed Semantic Focus.

Ratios of modification texts containing ambiguous pronouns

Ratios of modification texts containing ambiguous pronouns on WebVid-CoVR and EgoCVR datasets.

Framework: Hierarchical Uncertainty-aware Disambiguation network (HUD)

Overall framework of HUD consists of (a) Holistic Pronoun Disambiguation, (b) Atomistic Uncertainty Modeling, and (c) Holistic-to-Atomistic Alignment.

Experiment

Performance comparison on the test set of the CVR dataset, WebVid-CoVR. The overall best results are in bold, while the best results over baselines are underlined.

Performance comparison on the CIR dataset, FashionIQ and CIRR, relative to R@K(%). The overall best results are in bold, while the best results over baselines are underlined.

Ablation study on the CVR datasets, WebVid-CoVR, and CIR datasets, FashionIQ and CIRR. Δ denotes the performance drop of the compared derivatives and is marked with the green background.

Sensitivity to Probabilistic Sample Number 𝑈 on (a) CVR and (b) CIR task.

Case study on (a) WebVid-CoVR and (b) CIRR.

BibTeX


@inproceedings{hud,
title={HUD: Hierarchical Uncertainty-Aware Disambiguation Network
for Composed Video Retrieval},
author={Chen, Zhiwei and Hu, Yupeng and Li, Zixu and Fu, Zhiheng and Wen, Haokun and Guan, Weili},
booktitle={Proceedings of the 33th ACM International Conference on Multimedia},
year={2025}
}