AttributeDiffusion: Diffusion Driven Diverse Attribute Editing

WACV 2025

Rishubh Parihar*, Prasanna Balaji*, Raghav Magazine, Sarthak Vora, Varun Jampani, R. Venkatesh Babu

Indian Insitute of Science, Bangalore

TLDR: Our method learns a distribution over plausible variations for a given attribute and provides a principled way of exploration. We train a Diffusion Model to represent diverse attribute variations and enables generating diverse variations for a given attribute. Additionally, a coarse-to-fine sampling strategy is proposed that allows for interactive exploration of all plausible attribute variations.

Abstract

Image attribute editing is a widely researched area fueled by the recent advancements in deep generative models. Existing methods treat semantic attributes as binary and do not allow the user to generate multiple variations of the attribute edits. This limits the applications of editing methods in the real world, e.g., exploring multiple eyeglass variations on an e-commerce platform. In this paper, we present a technique to generate a collection of diverse attribute edits and a principled way to explore them. Generation and controlled exploration of attribute variations is challenging as it requires fine control over the attribute styles while preserving other attributes and the identity of the subject. Capitalizing on the attribute disentanglement property of the latent spaces of pretrained GANs, we represent the attribute edits in this space. Next, we train a diffusion model to model these latent directions of edits. To explore these variations in a controlled manner, we propose a coarse-to-fine sampling strategy. Extensive experiments on various datasets establish the effectiveness and generalization of the proposed approach for the generation and controlled exploration of diverse attribute edits.

Diverse eyeglass variations generated by AttributeDiffusion. The generated eyeglass variations capture diverse eyeglass frames, lens shade and structure.

Approach

The methodology comprises three major stages: 1) Dataset Generation - We create a dataset of edit directions by embedding negative and positive image pairs into the latent space and computing the difference between these directions. 2) Training - We train a DDPM model over the dataset of edit directions for the given attribute employing a denoising objective. 3) Inference - To edit a new image, we first encode it into the latent space and then add an edit direction sampled with iterative denoising in the reverse diffusion process.

Diverse Attribute Editing on 3D Aware GAN

Coarse-to-fine sampling for diverse eyeglass styles. First two coarse edits are generated with oval cooling glasses and other rectangular framed glasses.

Edited image variations and surface maps of edited geometry generated by editing on 3D aware GAN. Observe the variations in shape in the eyeglass and hair regions of the edited geometry.

Attribute variations with StyleGANs

Attribute edits by AttributeDiffusion on Cars and Churches. Diverse styles of churches are generated following the same layout. For Cars we generate diverse styles for the sports car variations and classic car variations.

Coarse-to-fine exploration of attribute edits

Hierarchical attribute exploration - For a source image, first two set of coarse edits are generated next fine variations of the generated coarse edits are sampled. Such a coarse-to-fine sampling startegy helps in exploring multiple attribute variations in a principled way.