BitsFusion: 1.99 bits Weight Quantization of Diffusion Model
Yang Sui1,2    Yanyu Li1    Anil Kag1    Yerlan Idelbayev1    Junli Cao1    Ju Hu1   
Dhritiman Sagar1    Bo Yuan2    Sergey Tulyakov1    Jian Ren1
1Snap Inc.    2Rutgers University   

NeurIPS 2024


 
BitsFusion compresses the UNet of Stable Diffusion v1.5 (1.72 GB, FP16) into 1.99 bits (219 MB), achieving a 7.9X compression ratio and even better performance.

 
overview

Left 1: a portrait of an anthropomorphic cyberpunk raccoon smoking a cigar, cyberpunk!, fantasy, elegant, digital painting, artstation, concept art, matte, sharp focus, illustration, art by josan Gonzalez

Left 2: Pirate ship trapped in a cosmic maelstrom nebula, rendered in cosmic beach whirlpool engine, volumetric lighting, spectacular, ambient lights, light pollution, cinematic atmosphere, art nouveau style, illustration art artwork by SenseiJaye, intricate detail.

Left 3: tropical island, 8 k, high resolution, detailed charcoal drawing, beautiful hd, art nouveau, concept art, colourful, in the style of vadym meller

Left 4: anthropomorphic art of a fox wearing a white suit, white cowboy hat, and sunglasses, smoking a cigar, texas inspired clothing by artgerm, victo ngai, ryohei hase, artstation. highly detailed digital painting, smooth, global illumination, fantasy art by greg rutkowsky, karl spitzweg

Left 5: a painting of a lantina elder woman by Leonardo da Vinci . details, smooth, sharp focus, illustration, realistic, cinematic, artstation, award winning, rgb , unreal engine, octane render, cinematic light, macro, depth of field, blur, red light and clouds from the back, highly detailed epic cinematic concept art CG render made in Maya, Blender and Photoshop, octane render, excellent composition, dynamic dramatic cinematic lighting, aesthetic, very inspirational, arthouse.

Left 6: panda mad scientist mixing sparkling chemicals, high-contrast painting

Left 7: An astronaut riding a horse on the moon, oil painting by Van Gogh.

Left 8: A red dragon dressed in a tuxedo and playing chess. The chess pieces are fashioned after robots.
Top: Images generated from full-precision Stable Diffusion v1.5. Bottom: Images generated from BitsFusion, where the weights of UNet are quantized into 1.99 bits, achieving 7.9X smaller storage than the one from Stable Diffusion v1.5. All the images are synthesized under the setting of using PNDM sampler with 50 sampling steps and random seed as 1024.

Abstract

Diffusion-based image generation models have achieved great success in recent years by showing the capability of synthesizing high-quality content. However, these models contain a huge number of parameters, resulting in a significantly large model size. Saving and transferring them is a major bottleneck for various applications, especially those running on resource-constrained devices. In this work, we develop a novel weight quantization method that quantizes the UNet from Stable Diffusion v1.5 to 1.99 bits, achieving a model with 7.9X smaller size while exhibiting even better generation quality than the original one. Our approach includes several novel techniques, such as assigning optimal bits to each layer, initializing the quantized model for better performance, and improving the training strategy to dramatically reduce quantization error. Furthermore, we extensively evaluate our quantized model across various benchmark datasets and through human evaluation to demonstrate its superior generation quality.

 

Presentation Video


 

Overview of Training and Inference Pipeline

overview
Left: We analyze the quantization error for each layer in SD-v1.5 and derive the mixed-precision recipe to assign different bit widths to different layers. We then initialize the quantized UNet by adding a balance integer, pre-computing and caching the time embedding, and alternately optimizing the scaling factor. Middle: During the Stage-I training, we freeze the teacher model (i.e., SD-v1.5) and optimize the quantized UNet through CFG-aware quantization distillation and feature distillation losses, along with sampling time steps by considering quantization errors. During the Stage-II training, we fine-tune the previous model with the noise prediction. Right: For the inference stage, using the pre-cached time features, our model processes text prompts and generates high-quality images.

Benchmark Comparisons

overview
Comparison between our 1.99-bits model vs. SD-v1.5 on various evaluation metrics with CFG scales ranging from 2.5 to 9.5.

 

Human Evaluation

Descriptive Alt Text
Human evaluation comparisons between SD-v1.5 and BitsFusion. BitsFusion is favored 54.41% of the time over SD-v1.5.

 

More Comparisons

Hover the cursor on the images to reveal the prompts.
overview

Left 1: A person standing on the desert, desert waves, gossip illustration, half red, half blue, abstract image of sand, clear style, trendy illustration, outdoor, top view, clear style, precision art, ultra high definition image

Left 2: A detailed oil painting of an old sea captain, steering his ship through a storm. Saltwater is splashing against his weathered face, determination in his eyes. Twirling malevolent clouds are seen above and stern waves threaten to submerge the ship while seagulls dive and twirl through the chaotic landscape. Thunder and lights embark in the distance, illuminating the scene with an eerie green glow.

Left 3: A solitary figure shrouded in mists peers up from the cobble stone street at the imposing and dark gothic buildings surrounding it. an old-fashioned lamp shines nearby. oil painting.

Left 4: A deep forest clearing with a mirrored pond reflecting a galaxy-filled night sky

Left 5: a handsome 24 years old boy in the middle with sky color background wearing eye glasses, it's super detailed with anime style, it's a portrait with delicated eyes and nice looking face

Left 6: A dog that has been meditating all the time.
Top: full-precision Stable Diffusion v1.5. Bottom: 1.99 bits BitsFusion.

 
 
overview

Left 1: A small cactus with a happy face in the Sahara desert.

Left 2: A middle-aged woman of Asian descent, her dark hair streaked with silver, appears fractured and splintered, intricately embedded within a sea of broken porcelain. The porcelain glistens with splatter paint patterns in a harmonious blend of glossy and matte blues, greens, oranges, and reds, capturing her dance in a surreal juxtaposition of movement and stillness. Her skin tone, a light hue like the porcelain, adds an almost mystical quality to her form.

Left 3: A high contrast portrait photo of a fluffy hamster wearing an orange beanie and sunglasses holding a sign that says "Let's PAINT!”

Left 4: An extreme close-up of an gray-haired man with a beard in his 60s, he is deep in thought pondering the history of the universe as he sits at a cafe in Paris, his eyes focus on people offscreen as they walk as he sits mostly motionless, he is dressed in a wool coat suit coat with a button-down shirt , he wears a brown beret and glasses and has a very professorial appearance, and the end he offers a subtle closed-mouth smile as if he found the answer to the mystery of life, the lighting is very cinematic with the golden light and the Parisian streets and city in the background, depth of field, cinematic 35mm film.

Left 5: poster of a mechanical cat, techical Schematics viewed from front and side view on light white blueprint paper, illustartion drafting style, illustation, typography, conceptual art, dark fantasy steampunk, cinematic, dark fantasy

Left 6: I want to supplement vitamin c, please help me paint related food.
Top: full-precision Stable Diffusion v1.5. Bottom: 1.99 bits BitsFusion.

 
 
overview

Left 1: new cyborg with cybertronic gadgets and vr helmet, hard surface, beautiful colours, sharp textures, shiny shapes, acid screen, biotechnology, tim hildebrandt, bruce pennington, donato giancola, larry elmore, masterpiece, trending on artstation, featured on pixiv, cinematic composition, dramatic pose, beautiful lighting, sharp, details, hyper - detailed, hd, hdr, 4 k, 8 k

Left 2: portrait of teenage aphrodite, light freckles, curly copper colored hair, smiling kindly, wearing an embroidered white linen dress with lace neckline, intricate, elegant, mother of pearl jewelry, glowing lights, highly detailed, digital painting, artstation, concept art, smooth, sharp focus, illustration, art by wlop, mucha, artgerm, and greg Rutkowski

Left 3: portrait of a dystopian cute dog wearing an outfit inspired by the handmaid � s tale ( 2 0 1 7 ), intricate, headshot, highly detailed, digital painting, artstation, concept art, sharp focus, cinematic lighting, digital painting, art by artgerm and greg rutkowski, alphonse mucha, cgsociety

Left 4: Portrait of a man by Greg Rutkowski, symmetrical face, a marine with a helmet, using a VR Headset, Kubric Stare, crooked smile, he's wearing a tacitcal gear, highly detailed portrait, scifi, digital painting, artstation, book cover, cyberpunk, concept art, smooth, sharp foccus ilustration, Artstation HQ

Left 5: Film still of female Saul Goodman wearing a catmaid outfit, from Red Dead Redemption 2 (2018 video game), trending on artstation, artstationHD, artstationHQ

Left 6: oil paining of robotic humanoid, intricate mechanisms, highly detailed, professional digital painting, Unreal Engine 5, Photorealism, HD quality, 8k resolution, cinema 4d, 3D, cinematic, professional photography, art by artgerm and greg rutkowski and alphonse mucha and loish and WLOP
Top: full-precision Stable Diffusion v1.5. Bottom: 1.99 bits BitsFusion.

 
 
overview

Left 1: anthropomorphic tetracontagon head in opal edgy darknimite mudskipper, intricate, elegant, highly detailed animal monster, digital painting, artstation, concept art, smooth, sharp focus, illustration, art by artgerm, bob eggleton, michael whelan, stephen hickman, richard corben, wayne barlowe, trending on artstation and greg rutkowski and alphonse mucha, 8 k

Left 2: background shows moon, many light effects, particle, lights, gems, symmetrical!!! centered portrait dark witch, large cloak, fantasy forest landscape, dragon scales, fantasy magic, undercut hairstyle, short purple black fade hair, dark light night, intricate, elegant, sharp focus, digital painting, concept art, matte, art by wlop and artgerm and greg rutkowski and alphonse mucha, masterpiece

Left 3: cat seahorse fursona, autistic bisexual graphic designer and musician, long haired attractive androgynous fluffy humanoid character design, sharp focus, weirdcore voidpunk digital art by artgerm, akihiko yoshida, louis wain, simon stalenhag, wlop, noah bradley, furaffinity, artstation hd, trending on deviantart

Left 4: concept art of ruins of a victorian city burning down by j. c. leyendecker, wlop, ruins, dramatic, octane render, epic painting, extremely detailed, 8 k

Left 5: hyperrealistic Gerald Gallego as a killer clown from outer space, trending on artstation, portrait, sharp focus, illustration, art by artgerm and greg rutkowski and magali Villeneuve

Left 6: low angle photo of a squirrel dj wearing on - ear headphones and colored sunglasses, stadning at a dj table playing techno music at a dance club, hyperrealistic, highly detailed, intricate, smoke, colored lights, concept art, digital art, oil painting, character design by charlie bowater, ross tran, artgerm, makoto shinkai, wlop
Top: full-precision Stable Diffusion v1.5. Bottom: 1.99 bits BitsFusion.

 
 
overview

Left 1: a photograph of an ostrich wearing a fedora and singing soulfully into a microphone

Left 2: a pirate ship landing on the moon

Left 3: a pumpkin with a candle in it

Left 4: a rabbit wearing a black tophat and monocle

Left 5: a red sports car on the road

Left 6: a robot cooking in the kitchen.
Top: full-precision Stable Diffusion v1.5. Bottom: 1.99 bits BitsFusion.

 
 
overview

Left 1: a baby daikon radish in a tutu

Left 2: a baby penguin wearing a blue hat, red gloves, green shirt, and yellow pants

Left 3: a woman with long black hair and dark skin

Left 4: an emoji of a baby penguin wearing a blue hat, red gloves, green shirt, and yellow pants

Left 5: a blue sports car on the road

Left 6: a butterfly.
Top: full-precision Stable Diffusion v1.5. Bottom: 1.99 bits BitsFusion.

 
 
overview

Left 1: Helmet of a forgotten Deity, clowing corals, extremly detailed digital painting, in the style of Fenghua Zhong and Ruan Jia and jeremy lipking and Peter Mohrbacher, mystical colors, rim light, beautiful lighting, 8k, stunning scene, raytracing, octane, trending on artstation

Left 2: Jeff Bezos as a female amazon warrior, closeup, D\&D, fantasy, intricate, elegant, highly detailed, digital painting, artstation, concept art, matte, sharp focus, illustration, hearthstone, art by Artgerm and Greg Rutkowski and Alphonse Mucha

Left 3: Portrait of a draconic humanoid, HD, illustration, epic, D\&D, fantasy, intricate, elegant, highly detailed, digital painting, artstation, concept art, smooth, sharp focus, illustration, art by artgerm and greg rutkowski and alphonse mucha, monster hunter illustrations art book

Left 4: [St.Georges slaying a car adorned with checkered flag. Soviet Propaganda!!! poster!!!, elegant, highly detailed, digital painting, artstation, concept art, matte, sharp focus, illustration, octane render, unreal engine, photography]

Left 5: a fire - breathing dragon at a medieval hobbit home, ornate, beautiful, atmosphere, vibe, mist, smoke, chimney, rain, wet, pristine, puddles, waterfall, clear stream, bridge, forest, flowers, concept art illustration, color page, 4 k, tone mapping, doll, akihiko yoshida, james jean, andrei riabovitchev, marc simonetti, yoshitaka amano, digital illustration, greg rutowski, volumetric lighting, sunbeams, particles

Left 6: portrait of a well-dressed raccoon, oil painting in the style of Rembrandt
Top: full-precision Stable Diffusion v1.5. Bottom: 1.99 bits BitsFusion.

 
 

 

Bibtex