Car Motion Control
Motion decomposition can enable multiple downstream tasks such manipulating the strength of a single motion component. Here we modulate the strength of the car motion by scaling the transition latent trajectory.
Real-world objects perform complex motions that involve multiple decomposable motion components. For example, while talking, a person continuously changes their expressions, head pose, and body pose.
Recently, a range of methods have been proposed for unconditional video generation. However, these approaches provide limited control for manipulating an object’s motion in the generated video. In this work, we propose object motion decomposition in videos using a pretrained image GAN model. We generate disentangled linear motion subspaces in the latent space of widely used StyleGAN2 models that are semantically meaningful and control a single explainable motion component. We evaluate the disentanglement properties of motion subspaces on faces and car datasets with extensive quantitative and qualitative experiments.
Furthermore, we show results of the proposed motion decomposition on downstream video editing tasks of selective motion transfer, e.g., transferring only facial expressions and video stabilization without explicitly training for these tasks. As we are utilizing pretrained StyleGAN2 models, our method seamlessly integrates with latent-based editing and stylization methods.
Motion decomposition can enable multiple downstream tasks such manipulating the strength of a single motion component. Here we modulate the strength of the car motion by scaling the transition latent trajectory.
Here we module the strength of the expressions of the subject. Observe the large facial motions for the large strength values.
As we use StyleGAN2 as the backbone, our method seemlessly integrate with latent space manipulation based attribute editing. We first embed the source image and the driving video into the W+ latent space and then transfer the trajectory to the source latent. Finally, we perform attribute editing on the transferred latent trajectory to obtain reenactment results with editing.
Using StyleGAN also enables to perform stylization of the video by adapting the generator to a new domain using StyleGAN-nada
There's a lot of excellent work which helped our contribution.
Stitch it in Time: GAN-Based Facial Editing of Real Videos
StyleGan2 : The style-based GAN architecture (StyleGAN) yields state-of-the-art results in data-driven
unconditional generative image modeling.
StyleGAN-NADA converts a pre-trained generator to new domains using only a textual prompt and no training data.
@article{park2021nerfies,
author = {Rishubh Parihar, Raghav Magazine, Piyush Tiwari, R. Venkatesh Babu},
title = {We never go out of Style: Motion Disentanglement by Subspace Decomposition of Latent Space},
journal = {AI4CC-CVPRW},
year = {2023},
}