[ACM MM'25] OFFSET: Segmentation-based Focus Shift Revision for Composed Image Retrieval

1School of Software, Shandong University,
2Department of Data Science, City University of Hong Kong,
3School of Computer Science and Technology, Harbin Institute of Technology (Shenzhen)

*Corresponding author.

Abstract

MY ALT TEXT

Inhomogeneity and Text-Priority in Composed Image Retrieval

MY ALT TEXT

(a) gives an example of the CIR task. (b) demonstrates the phenomenon of inhomogeneity in visual samples, where images frequently comprise dominant and noisy regions. (c) illustrates the advantages of applying text-priority during multimodal feature composition. The image caption treats “trees” as background noise information, which is inconsistent with the focus on modification text and may result in inaccurate composition results. However, when modification text is the primary objective, “trees” can be re-identified as the dominant region, thereby facilitating the construction of more accurate composed features.


Framework: segmentatiOn-based Focus shiFt reviSion nETwork (OFFSET)

MY ALT TEXT

The proposed OFFSET consists of three key modules: (a) Dominant Portion Segmentation, (b) Dual Focus Mapping, and (c) Textually Guided Focus Revision, where (a) and (b) collectively form the feature extractor.


Experiment

MY ALT TEXT

Performance comparison on FashionIQ relative to R@K(%). The overall best results are in bold, while the best results over baselines are underlined.


MY ALT TEXT

Performance comparison on CIRR with respect to R@K(%) and Rsubsset@K(%). The overall best results are in bold, while the best results over baselines are underlined.


MY ALT TEXT

Performance comparison on Shoes with respect to R@K(%). The overall best results are in bold, while the best results over baselines are underlined.


MY ALT TEXT

Ablation Studies of OFFSET with various settings on FashionIQ, Shoes, and CIRR. Δ represents the performance degradation of the compared derivatives and is marked with the green background. The yellow background denotes the baseline performance utilized for per column.


MY ALT TEXT

Sensitivity to Focus Channel Number 𝑃 and the hyper-parameter 𝜇 on (a) FashionIQ, (b) Shoes, and (c) CIRR.


MY ALT TEXT

Case study on (a) FashionIQ and (b) CIRR.

BibTeX


        @inproceedings{offset,
        title={OFFSET: Segmentation-based Focus Shift Revision for Composed Image Retrieval},
        author={Chen, Zhiwei and Hu, Yupeng and Li, Zixu and Fu, Zhiheng and Song, Xuemeng and Nie, Liqiang},
        booktitle={Proceedings of the 33th ACM International Conference on Multimedia},
        year={2025}
        }