Omni-ID: Holistic Identity Representation Designed for Generative Tasks (CVPR'25)

Abstract

We introduce Omni-ID, a novel facial representation designed specifically for generative tasks. Omni-ID encodes holistic information about an individual's appearance across diverse expressions and poses within a fixed-size representation. It consolidates information from a varied number of unstructured input images into a structured representation, where each entry represents certain global or local identity features. Our approach leverages a few-to-many identity reconstruction training paradigm, where a few images of an individual serve as input to reconstruct multiple target images of the same individual in varied poses and expressions. To train the Omni-ID encoder, we use a multi-decoder framework that leverages the complementary strengths of different decoders during the representation learning phase. Unlike conventional representations, such as CLIP and ArcFace, which are typically learned through discriminative or contrastive objectives, Omni-ID is optimized with a generative objective, resulting in a more comprehensive and nuanced identity capture for generative tasks. Trained on our MFHQ dataset -- a multi-view facial image collection, Omni-ID demonstrates substantial improvements over conventional representations across various generative tasks.

TL;DR: Omni-ID is a novel facial representation tailored for generative tasks, encoding identity features from unstructured images into a fixed-size representation that captures diverse expressions and poses.

Motivation

Generating images that faithfully represent an individual’s identity requires a face encoding capable of depicting nuanced details across diverse poses and facial expressions. Existing facial representations fall short in generative tasks due to (1) their reliance on single-image encodings, which fundamentally lack comprehensive information about an individual’s appearance, and (2) their optimization for discriminative tasks, which fail to preserve the subtle nuances that define a person’s unique identity, particularly across varying poses and expressions.

Comparison of face generation representations

Face generation comparison of different facial representations with single input (top row) and two inputs (bottom row).

Method

We introduce a new face representation named Omni-ID, featuring an Omni-ID Encoder and a novel few-to-many identity reconstruction training with a multi-decoder objective. Designed for generative tasks, this representation aims to enable high-fidelity face generation in diverse poses and expressions, supporting a wide array of generative applications.

Omni-ID employs a multi-decoder few-to-many identity reconstruction training strategy.

Qualitative comparisons with different representations in personalized T2I generation. We show results of the same IP-Adapter trained with different representations.

Qualitative comparisons with the state-of-the-art in personalized T2I generation using FLUX dev as the base model. Our Omni-ID with IP-Adapter without any other regularization (LoRA, ID loss, alignment loss) achieves highest ID preservation. Omni-ID also works well on FLUX Schnell model, which generates each sample by 4 denoising steps.

Qualitative comparisons to the state-of-the-art in personalized T2I generation using Stable Diffusion as the base model. Our IPA-Omni-ID use SD15 as the base model outperforms other representations and the state-of-the-art personalization techniques.

Qualitative comparisons to the state-of-the-art representations in controllable face generation. We compare Omni-ID with ArcFace and CLIP with 5 input images.

Omni-ID: Holistic Identity Representation Designed for Generative Tasks

CVPR 2025

Guocheng Qian

Kuan-Chieh Wang

Or Patashnik

Negin Heravi

Daniil Ostashev

Sergey Tulyakov

Daniel Cohen-Or

Kfir Aberman

Snap Research

Abstract

Motivation

Method

Omni-ID Training Strategy

Omni-ID Encoder

Experiments

Personalized Text-to-Image Generation (Representation Comparisons)

Personalized Text-to-Image Generation (SOTA Comparisons)

Personalized Text-to-Image Generation (SD Base Models)

Controllable Face Generation

Citation