Scroll to explore

Omni-ID:
Holistic Identity Representation Designed for Generative Tasks

Snap Inc., USA
CVPR 2025

Omni-ID is a novel facial representation tailored for generative tasks, encoding identity features from unstructured images into a fixed-size representation that captures diverse expressions and poses.

Abstract

We introduce Omni-ID, a novel facial representation designed specifically for generative tasks. Omni-ID encodes holistic information about an individual's appearance across diverse expressions and poses within a fixed-size representation. It consolidates information from a varied number of unstructured input images into a structured representation, where each entry represents certain global or local identity features. Our approach leverages a few-to-many identity reconstruction training paradigm, where a few images of an individual serve as input to reconstruct multiple target images of the same individual in varied poses and expressions. To train the Omni-ID encoder, we use a multi-decoder framework that leverages the complementary strengths of different decoders during the representation learning phase. Unlike conventional representations, such as CLIP and ArcFace, which are typically learned through discriminative or contrastive objectives, Omni-ID is optimized with a generative objective, resulting in a more comprehensive and nuanced identity capture for generative tasks. Trained on our MFHQ dataset -- a multi-view facial image collection, Omni-ID demonstrates substantial improvements over conventional representations across various generative tasks.

Contributions:

1

Multi-Face Generative Encoder

We propose a multi-face generative encoder that consolidates information from multiple unstructured input images into a structured representation, capturing both global and local identity features.

2

Few-to-Many Multi-Decoder Training

We introduce a novel few-to-many identity reconstruction training paradigm with a multi-decoder framework that leverages complementary strengths of different decoders during representation learning.

3

State-of-the-Art Identity Preservation

Through comprehensive evaluations, we demonstrate that Omni-ID achieves state-of-the-art identity preservation and substantial improvements over conventional representations across various generative tasks.

Motivation

Generating images that faithfully represent an individual's identity requires a face encoding capable of depicting nuanced details across diverse poses and facial expressions. Existing facial representations fall short in generative tasks due to (1) their reliance on single-image encodings, which fundamentally lack comprehensive information about an individual's appearance, and (2) their optimization for discriminative tasks, which fail to preserve the subtle nuances that define a person's unique identity, particularly across varying poses and expressions.

Comparison of face generation representations
Face generation comparison of different facial representations with single input (top row) and two inputs (bottom row).

Method

We introduce a new face representation named Omni-ID, featuring an Omni-ID Encoder and a novel few-to-many identity reconstruction training with a multi-decoder objective. Designed for generative tasks, this representation aims to enable high-fidelity face generation in diverse poses and expressions, supporting a wide array of generative applications.

Omni-ID Training Strategy

Omni-ID uses a few-to-many identity reconstruction training paradigm that not only reconstructs the input images but also a diverse range of other images of the same identity in various contexts, poses, and expressions. This strategy encourages the representation to capture essential identity features observed across different conditions while mitigating overfitting to specific attributes of any single input image.

Omni-ID employs a multi-decoder training objective that combines the unique strengths of various decoders, such as improved fidelity or reduced identity leakage, while mitigating the limitations of any single decoder. This enables leveraging the detailed facial information present in the input images to the greatest feasible degree and results in a more robust encoding that effectively generalizes across various generative applications.

Few-to-many identity reconstruction
Omni-ID employs a multi-decoder few-to-many identity reconstruction training strategy.

Omni-ID Encoder

Omni-ID Encoder receives a set of images of an individual, projecting them into keys and values for cross-attention layers. These layers attend to learnable queries that are semantic-aware, allowing the encoder to capture shared identity features across images. Self-attention layers refine these interactions further, producing a holistic representation.

Omni-ID encoder architecture

Experiments

Also Read

Citation

@inproceedings{qian2025omni,
  title={Omni-id: Holistic identity representation designed for generative tasks},
  author={Qian, Guocheng and Wang, Kuan-Chieh and Patashnik, Or and Heravi, Negin and Ostashev, Daniil and Tulyakov, Sergey and Cohen-Or, Daniel and Aberman, Kfir},
  booktitle={Proceedings of the Computer Vision and Pattern Recognition Conference},
  pages={8786--8795},
  year={2025}
}