Painterly Image Harmonization using Diffusion Model


Painterly image harmonization aims to insert photographic objects into paintings and obtain artistically coherent composite images. Previous methods for this task mainly rely on inference optimization or generative adversarial network, but they are either very time-consuming or struggling at fine control of the foreground objects (e.g., texture and content details). To address these issues, we focus on feed-forward diffusion models and propose a novel Stable diffusion with Dual Encoder fusion network (SDENet), which includes a lightweight adaptive encoder and a Dual Encoder Fusion (DEF) module. Specifically, the adaptive encoder and the DEF module first stylize foreground features through aggregating relevant style information from background within each encoder. Then, the stylized foreground features from both encoders are combined to provide sufficient guidance for the harmonization process. During training, besides the noise loss in diffusion model, we additionally employ two style losses, i.e., AdaIN style loss and contrastive style loss, aiming to balance the trade-off between style migration and content preservation. Compared with the state-of-the-art models from related fields, our SDENet can stylize the foreground more sufficiently and simultaneously retain finer content on the benchmark datasets.

Proceedings of the 31th ACM International Conference on Multimedia (ACM MM 2023)