SD4Match: Learning to Prompt Stable Diffusion Model for Semantic Matching

CVPR 2024

Xinghui Li, Jingyi Lu, Kai Han, Victor Adrian Prisacariu
Active Vision Lab, University of Oxford Visual AI Lab, University of Hong Kong

Arxiv Bibtex Code
Description of GIF Description of GIF
Description of GIF Description of GIF
Description of GIF Description of GIF


In this work, we address the challenge of matching semantically similar keypoints across image pairs. Existing research indicates that the intermediate output of the UNet within the Stable Diffusion (SD) framework can serve as robust image feature maps for such a matching task. We demonstrate that by employing a basic prompt tuning technique, the inherent potential of Stable Diffusion can be harnessed, resulting in a significant enhancement in accuracy over previous approaches. We further introduce a novel conditional prompting module that conditions the prompt on the local details of the input image pairs, leading to a further improvement in performance. We designate our approach as SD4Match, short for Stable Diffusion for Semantic Matching. Comprehensive evaluations of SD4Match on the PF-Pascal, PF-Willow, and SPair-71k datasets show that it sets new benchmarks in accuracy across all these datasets. Particularly, SD4Match outperforms the previous state-of-the-art by a margin of 12 percentage points on the challenging SPair-71k dataset.

General Pipeline

We directly optimize the prompt embedding of the Stable Diffusion model with keypoint supervision. We offer three prompt learning options: a single and universial prompt (Single); one prompt for each object category (Class) and a conditional prompting module (CPM).

General Pipeline

(Scroll down for CPM architecture)

Conditional Prompting Module (CPM)

Conditional Prompting Module (CPM)

SPair-71K Benchmark Result

quantative result

Sparse Matching

ours_boat4 sddino_boat4 dift_boat4
ours_train4 sddino_train4 dift_train4
ours_pottedplant2 sddino_pottedplant2 dift_pottedplant2
ours_chair1 sddino_chair1 dift_chair1

Dense Matching

aeroplane_2 bottle_2 cow_2 chair_2
bike_1 boat_1 dog_2 motorbike_1
bird_2 car_2 horse_1 person_1


	title={SD4Match: Learning to Prompt Stable Diffusion Model for Semantic Matching}, 
	author={Xinghui Li and Jingyi Lu and Kai Han and Victor Prisacariu},