InstantRestore: Single-Step Personalized Face Restoration with Shared-Image Attention

1Snap Inc., 2University of California, Los Angeles, 3Tel Aviv University
*Denotes equal contribution.
Denotes equal advising.
teaser image

Given severely degraded face images, InstantRestore efficiently and effectively restores the original subject, achieving superior identity preservation compared to previous approaches, while delivering near-real-time performance.

TL;DR: Leveraging implicit correspondences from a pretrained diffusion model, we align degraded input patches with high-quality patches from ~4 reference images to restore identity-specific details in a single forward pass.

Abstract

Face image restoration aims to enhance degraded facial images while addressing challenges such as diverse degradation types, real-time processing demands, and, most crucially, the preservation of identity-specific features. Existing methods often struggle with slow processing times and suboptimal restoration, especially under severe degradation, failing to accurately reconstruct finer-level identity details. To address these issues, we introduce InstantRestore, a novel framework that leverages a single-step image diffusion model and an attention-sharing mechanism for fast and personalized face restoration. Additionally, InstantRestore incorporates a novel landmark attention loss, aligning key facial landmarks to refine the attention maps, enhancing identity preservation. At inference time, given a degraded input and a small (~4) set of reference images, InstantRestore performs a single forward pass through the network to achieve near real-time performance. Unlike prior approaches that rely on full diffusion processes or per-identity model tuning, InstantRestore offers a scalable solution suitable for large-scale applications. Extensive experiments demonstrate that InstantRestore outperforms existing methods in quality and speed, making it an appealing choice for identity-preserving face restoration.

How Does it Work?

Method
  • We fine-tune a pretrained single-step diffusion model to directly map a degraded input image to a high-quality restored output in a single forward pass.

  • Operating in a single step allows us to apply image-based losses such as LPIPS, MSSIM, identity, and adversarial losses directly on the output, providing more explicit and effective supervision for training compared to mulit-step diffusion approaches.

  • To inject identity-specific information, we use a frozen diffusion model to extract the keys and values from a set of reference images and inject those into the restoration process.


    Injecting Identity Information


    Method
  • Previous works have shown that diffusion models form implicit correspondences between images using the queries, keys, and values of the denoising network.

  • Using these correspondences, we find the most relevant high-quality reference patches for each low-quality input patch, giving each patch an attention weight via an extended self-attention mechanism.

  • After finding the most relevant reference patches, our task simplifies to "filling in" identity-related details the corresponding values from the selected reference regions.

  • We find that this transfer can be done with a single pass through the denoising network, as we only need to match relevant patches rather than generate a new image entirely, resulting in an efficient approach.

Results

Ours Results

Qualitative Comparisons

We compare InstantRestore with existing state-of-the-art blind face restoration methods, including GFPGAN, CodeFormer, DiffBIR, and Dual-Pivot Tuning, and against reference-based methods that leverage multiple reference images to guide restoration including ASFFNet and DMDNet.

Quantitative Comparisons

We compare our approach using both standard image-based metrics (LPIPS, SSIM, and PSNR) as well as identity similarity. We also evaluate the effect of adding additional references as well as our performance on the x4, x8, and x16 super resolution tasks

BibTeX

If you find our work useful, please cite our paper:

        
@misc{zhang2024instantrestore,
  title={InstantRestore: Single-Step Personalized Face Restoration with Shared-Image Attention}, 
  author={Howard Zhang and Yuval Alaluf and Sizhuo Ma and Achuta Kadambi and Jian Wang and Kfir Aberman},
  year={2024},
  eprint={2412.06753},
  archivePrefix={arXiv},
  primaryClass={cs.CV},
  url={https://arxiv.org/abs/2412.06753}, 
}