We introduce Omni-ID, a novel facial representation designed specifically for generative tasks. Omni-ID encodes holistic information about an individual's appearance across diverse expressions and poses within a fixed-size representation. It consolidates information from a varied number of unstructured input images into a structured representation, where each entry represents certain global or local identity features. Our approach leverages a few-to-many identity reconstruction training paradigm, where a few images of an individual serve as input to reconstruct multiple target images of the same individual in varied poses and expressions. To train the Omni-ID encoder, we use a multi-decoder framework that leverages the complementary strengths of different decoders during the representation learning phase. Unlike conventional representations, such as CLIP and ArcFace, which are typically learned through discriminative or contrastive objectives, Omni-ID is optimized with a generative objective, resulting in a more comprehensive and nuanced identity capture for generative tasks. Trained on our MFHQ dataset -- a multi-view facial image collection, Omni-ID demonstrates substantial improvements over conventional representations across various generative tasks.
Generating images that faithfully represent an individual’s identity requires a face encoding capable of depicting nuanced details across diverse poses and facial expressions. Existing facial representations fall short in generative tasks due to (1) their reliance on single-image encodings, which fundamentally lack comprehensive information about an individual’s appearance, and (2) their optimization for discriminative tasks, which fail to preserve the subtle nuances that define a person’s unique identity, particularly across varying poses and expressions.
We introduce a new face representation named Omni-ID, featuring an Omni-ID Encoder and a novel few-to-many identity reconstruction training with a multi-decoder objective. Designed for generative tasks, this representation aims to enable high-fidelity face generation in diverse poses and expressions, supporting a wide array of generative applications.
Omni-ID uses a few-to-many identity reconstruction training paradigm that not only reconstructs the input images but also a diverse range of other images of the same identity in various contexts, poses, and expressions. This strategy encourages the representation to capture essential identity features observed across different conditions while mitigating overfitting to specific attributes of any single input image.
Omni-ID employs a multi-decoder training objective that combines the unique strengths of various decoders, such as improved fidelity or reduced identity leakage, while mitigating the limitations of any single decoder. This enables leveraging the detailed facial information present in the input images to the greatest feasible degree and results in a more robust encoding that effectively generalizes across various generative applications.
Omni-ID Encoder receives a set of images of an individual, projecting them into keys and values for cross-attention layers. These layers attend to learnable queries that are semantic-aware, allowing the encoder to capture shared identity features across images. Self-attention layers refine these interactions further, producing a holistic representation.
Our Omni-ID achieves better ID preservation for both single and multiple input images than CLIP.
Our Omni-ID achieves better ID preservation than other representations, i.e. ArcFace, CLIP. IP-Adapter with our Omni-ID as representation without any regularization outperforms state of the art PuLID.
Omni-ID can be used in any diffusion model. Here we show Omni-ID outperforms other representations and the state-of-the-art personalization techniques when using UNet (Stable Diffusion) as the base model.
Our Omni-ID achieves superior identity preservation, captures nuanced details more faithfully, and demonstrates higher adaptivity to diverse poses and expressions.
@article{qian2024omniid, title = {Omni-ID: Holistic Identity Representation Designed for Generative Tasks}, author = {Guocheng Qian and Kuan-Chieh Wang and Or Patashnik and Negin Heravi and Daniil Ostashev and Sergey Tulyakov and Daniel Cohen-Or and Kfir Aberman}, journal = {arXiv preprint}, year = {2024}, }