The Power of Diffusion Models

CS5670 Project 5A - Introduction to Computer Vision
Cornell University, Spring 2025

Implement and deploy diffusion models for creative image generation tasks.

This project provides hands-on experience with pre-trained diffusion models, implementing the complete sampling loop and applying it to cutting-edge applications like inpainting, visual anagrams, and hybrid images. Using the powerful DeepFloyd IF model, we explore how diffusion models revolutionize generative AI.

Diffusion Model Applications Showcase
Creative applications: Inpainting, visual anagrams, hybrid images, and image-to-image translation

Project Details

Assigned: Thursday, April 17, 2025
Due: Friday, April 25, 2025 (Part A)
Individual Work: Must be completed individually
Platform: Google Colab with DeepFloyd IF
Key Concepts: Sampling loops, CFG, inpainting, visual illusions

Overview

Diffusion models represent the current state-of-the-art in generative AI, powering tools like DALL-E, Midjourney, and Stable Diffusion. This project demystifies how these models work by implementing the core sampling algorithms and exploring their creative applications.

We use DeepFloyd IF, a powerful two-stage diffusion model that generates high-quality images from text prompts. Through hands-on implementation, we learn how noise is iteratively refined into coherent, photorealistic images.

DeepFloyd IF Architecture

Two-Stage Generation Pipeline

DeepFloyd IF uses a cascaded approach for high-resolution image generation:

1

Stage I Model

64×64 pixel generation with text conditioning

2

Stage II Model

Super-resolution upsampling to 256×256 pixels

Text Conditioning: Both stages are conditioned on text embeddings, allowing precise control over generated content through natural language prompts.

Implementation Details

Part 1: Forward Process and Noise Addition

The forward process systematically adds Gaussian noise to clean images, creating a sequence from pure image to pure noise.

Forward Process Mathematics:

Forward Process Equation

Key Insight: The forward process is not just adding noise—we also scale the image by √ᾱₜ to maintain proper variance throughout the diffusion process.

Original Cornell Tower
t = 0
Original
Cornell Tower t=250
t = 250
Light noise
Cornell Tower t=500
t = 500
Medium noise
Cornell Tower t=750
t = 750
Heavy noise

Part 2: Denoising Comparison

Comparing classical denoising methods with neural diffusion approaches reveals the power of learned priors.

Gaussian Blur Denoising

Gaussian Denoising t=250

Classical method - removes noise but loses detail

One-Step Neural Denoising

Neural Denoising t=250

Neural method - preserves structure and semantics

Iterative Neural Denoising

Iterative Denoising Result

Best quality - gradual refinement

Part 3: Iterative Sampling Algorithm

The core of diffusion model generation: iteratively removing noise while maintaining image coherence.

DDPM Sampling Algorithm:

  1. Initialize: Start with pure Gaussian noise x_T
  2. Predict: Use UNet to estimate noise ε_θ(x_t, t)
  3. Denoise: Compute denoised prediction x_0
  4. Step: Calculate x_{t-1} using DDPM reverse process
  5. Iterate: Repeat until reaching clean image x_0

DDPM Reverse Process:

Reverse Process Mathematics
Generated Sample 1
Sample 1
Generated Sample 2
Sample 2
Generated Sample 3
Sample 3
Generated Sample 4
Sample 4
Generated Sample 5
Sample 5

Part 4: Classifier-Free Guidance (CFG)

Enhanced Generation Quality

CFG dramatically improves image quality by combining conditional and unconditional predictions.

CFG Formula:

Classifier-Free Guidance Equation

Magic Parameter: γ > 1 extrapolates beyond the conditional estimate, leading to higher quality but potentially less diverse results.

CFG Sample 1
CFG Sample 1
γ = 7.5
CFG Sample 2
CFG Sample 2
Higher quality
CFG Sample 3
CFG Sample 3
Sharper details

Creative Applications

Image-to-Image Translation (SDEdit)

By adding controlled amounts of noise and then denoising, we can edit existing images while preserving their core structure.

SDEdit i_start=1
i_start = 1
Minimal editing
SDEdit i_start=5
i_start = 5
Moderate changes
SDEdit i_start=10
i_start = 10
Significant edits
SDEdit i_start=20
i_start = 20
Major transformation

Hand-Drawn to Photorealistic

Transform sketches and drawings into photorealistic images by projecting them onto the natural image manifold.

Original Bear Sketch
Original Sketch
Hand-drawn input
Bear Sketch Result
Photorealistic Result
Natural image manifold
Original House Sketch
House Sketch
Simple line drawing
House Sketch Result
Realistic House
Architectural detail

Advanced Creative Techniques

Cutting-Edge Applications

Inpainting (RePaint Algorithm)

Cornell Tower Original
Original
Inpainting Mask
Mask
Inpainting Result
Inpainted

Algorithm: During each denoising step, force pixels outside the mask to match the original image with appropriate noise level.

Text-Conditional Editing

Snowy Mountain Edit
Snowy Mountain
Guided by text prompt

Technique: Replace "a high quality photo" with specific prompts to guide the manifold projection toward desired content.

Visual Anagrams - Optical Illusions

Create images that reveal different content when flipped upside down by averaging noise estimates from both orientations.

Visual Anagram Algorithm:

Visual Anagram Mathematics
Old Man Upright
Upright View
"An oil painting of an old man"
Campfire Flipped
Flipped 180°
"An oil painting of people around a campfire"

Innovation: This technique demonstrates the compositional nature of diffusion model representations and their ability to encode multiple interpretations simultaneously.

Hybrid Images with Diffusion

Combine high and low frequency components from different noise estimates to create hybrid images that change appearance based on viewing distance.

Factorized Diffusion Formula:

Hybrid Image Mathematics
Skull Waterfall Hybrid
Hybrid image: skull from far away, waterfall from close up
Hybrid Result 1
Custom Hybrid 1
Hybrid Result 2
Custom Hybrid 2

My Results

Implementation Achievements

Successfully implemented the complete diffusion sampling pipeline with all creative applications, demonstrating mastery of both the underlying mathematics and practical implementation challenges.

Technical Implementation Highlights

Inpainting Results

My Inpainting Original 1
Original Image 1
My Inpainting Result 1
Inpainted Result 1
My Inpainting Original 2
Original Image 2
My Inpainting Result 2
Inpainted Result 2

Visual Anagram Creations

My Visual Anagram 1
Custom Anagram 1
Dual interpretation illusion
My Visual Anagram 2
Custom Anagram 2
Flip to reveal hidden image

Extra Credit: Creative Explorations

Extra Credit Creation 1
Creative Experiment 1
Novel technique combination
Extra Credit Creation 2
Creative Experiment 2
Innovative application

Key Learnings

Diffusion Model Fundamentals

Creative Applications Insights

Technical Implementation Skills

Impact and Applications

Revolutionary Applications

Creative Industries: AI-assisted art creation, concept visualization, and rapid prototyping

Content Creation: Automated graphic design, social media content, and marketing materials

Medical Imaging: Image restoration, super-resolution, and synthetic data generation

Scientific Visualization: Data visualization, simulation results, and educational materials

Entertainment: Video game assets, film effects, and interactive media

Architecture & Design: Concept sketches to photorealistic renderings