CS5670 Project 5A: The Power of Diffusion Models

Implement and deploy diffusion models for creative image generation tasks.

This project provides hands-on experience with pre-trained diffusion models, implementing the complete sampling loop and applying it to cutting-edge applications like inpainting, visual anagrams, and hybrid images. Using the powerful DeepFloyd IF model, we explore how diffusion models revolutionize generative AI.

Creative applications: Inpainting, visual anagrams, hybrid images, and image-to-image translation

Project Details

Assigned:	Thursday, April 17, 2025
Due:	Friday, April 25, 2025 (Part A)
Individual Work:	Must be completed individually
Platform:	Google Colab with DeepFloyd IF
Key Concepts:	Sampling loops, CFG, inpainting, visual illusions

Overview

Diffusion models represent the current state-of-the-art in generative AI, powering tools like DALL-E, Midjourney, and Stable Diffusion. This project demystifies how these models work by implementing the core sampling algorithms and exploring their creative applications.

We use DeepFloyd IF, a powerful two-stage diffusion model that generates high-quality images from text prompts. Through hands-on implementation, we learn how noise is iteratively refined into coherent, photorealistic images.

DeepFloyd IF Architecture

Two-Stage Generation Pipeline

DeepFloyd IF uses a cascaded approach for high-resolution image generation:

Stage I Model

64×64 pixel generation with text conditioning

Stage II Model

Super-resolution upsampling to 256×256 pixels

Text Conditioning: Both stages are conditioned on text embeddings, allowing precise control over generated content through natural language prompts.

Implementation Details

Part 1: Forward Process and Noise Addition

The forward process systematically adds Gaussian noise to clean images, creating a sequence from pure image to pure noise.

Forward Process Mathematics:

Key Insight: The forward process is not just adding noise—we also scale the image by √ᾱₜ to maintain proper variance throughout the diffusion process.

t = 0

Original

t = 250

Light noise

t = 500

Medium noise

t = 750

Heavy noise

Part 2: Denoising Comparison

Comparing classical denoising methods with neural diffusion approaches reveals the power of learned priors.

Gaussian Blur Denoising

Classical method - removes noise but loses detail

One-Step Neural Denoising

Neural method - preserves structure and semantics

Iterative Neural Denoising

Best quality - gradual refinement

Part 3: Iterative Sampling Algorithm

The core of diffusion model generation: iteratively removing noise while maintaining image coherence.

DDPM Sampling Algorithm:

Initialize: Start with pure Gaussian noise x_T
Predict: Use UNet to estimate noise ε_θ(x_t, t)
Denoise: Compute denoised prediction x_0
Step: Calculate x_{t-1} using DDPM reverse process
Iterate: Repeat until reaching clean image x_0

DDPM Reverse Process:

Sample 1

Sample 2

Sample 3

Sample 4

Sample 5

Part 4: Classifier-Free Guidance (CFG)

Enhanced Generation Quality

CFG dramatically improves image quality by combining conditional and unconditional predictions.

CFG Formula:

Magic Parameter: γ > 1 extrapolates beyond the conditional estimate, leading to higher quality but potentially less diverse results.

CFG Sample 1

γ = 7.5

CFG Sample 2

Higher quality

CFG Sample 3

Sharper details

Creative Applications

Image-to-Image Translation (SDEdit)

By adding controlled amounts of noise and then denoising, we can edit existing images while preserving their core structure.

i_start = 1

Minimal editing

i_start = 5

Moderate changes

i_start = 10

Significant edits

i_start = 20

Major transformation

Hand-Drawn to Photorealistic

Transform sketches and drawings into photorealistic images by projecting them onto the natural image manifold.

Original Sketch

Hand-drawn input

Photorealistic Result

Natural image manifold

House Sketch

Simple line drawing

Realistic House

Architectural detail

Advanced Creative Techniques

Cutting-Edge Applications

Inpainting (RePaint Algorithm)

Original

Mask

Inpainted

Algorithm: During each denoising step, force pixels outside the mask to match the original image with appropriate noise level.

Text-Conditional Editing

Snowy Mountain

Guided by text prompt

Technique: Replace "a high quality photo" with specific prompts to guide the manifold projection toward desired content.

Visual Anagrams - Optical Illusions

Create images that reveal different content when flipped upside down by averaging noise estimates from both orientations.

Visual Anagram Algorithm:

Upright View

"An oil painting of an old man"

Flipped 180°

"An oil painting of people around a campfire"

Innovation: This technique demonstrates the compositional nature of diffusion model representations and their ability to encode multiple interpretations simultaneously.

Hybrid Images with Diffusion

Combine high and low frequency components from different noise estimates to create hybrid images that change appearance based on viewing distance.

Factorized Diffusion Formula:

Hybrid image: skull from far away, waterfall from close up

Custom Hybrid 1

Custom Hybrid 2

My Results

Implementation Achievements

Successfully implemented the complete diffusion sampling pipeline with all creative applications, demonstrating mastery of both the underlying mathematics and practical implementation challenges.

Technical Implementation Highlights

Forward Process: Implemented noise addition with proper variance scaling using alphas_cumprod coefficients
UNet Integration: Successfully interfaced with DeepFloyd's pretrained models including tensor device management and data type handling
Iterative Sampling: Built complete DDPM sampling loop with strided timesteps for efficient generation
CFG Implementation: Achieved significant quality improvements through classifier-free guidance
Creative Applications: Successful implementation of inpainting, visual anagrams, and hybrid images

Inpainting Results

Original Image 1

Inpainted Result 1

Original Image 2

Inpainted Result 2

Visual Anagram Creations

Custom Anagram 1

Dual interpretation illusion

Custom Anagram 2

Flip to reveal hidden image

Extra Credit: Creative Explorations

Creative Experiment 1

Novel technique combination

Creative Experiment 2

Innovative application

Key Learnings

Diffusion Model Fundamentals

Noise Scheduling: Understanding how different noise levels affect generation quality and editability
Sampling Strategies: Trade-offs between speed (fewer steps) and quality (more steps)
Classifier-Free Guidance: How CFG dramatically improves quality by extrapolating beyond conditional estimates
Manifold Projection: Using diffusion as a learned prior to project images onto natural image distributions
Text Conditioning: How language embeddings guide the generation process

Creative Applications Insights

Inpainting Mechanics: Understanding how to constrain diffusion while maintaining naturalness
Optical Illusions: Leveraging the compositional nature of neural representations
Frequency Decomposition: Applying classical signal processing concepts to neural generation
Image-to-Image Translation: Controlling the degree of transformation through noise levels
Sketch-to-Photo: Bridging different image domains through learned priors

Technical Implementation Skills

PyTorch Mastery: Advanced tensor operations, device management, and memory optimization
Model Integration: Working with large pretrained models and handling their requirements
Algorithm Implementation: Translating mathematical formulations into working code
Creative Problem Solving: Adapting techniques for novel applications beyond training objectives

Impact and Applications

Revolutionary Applications

Creative Industries: AI-assisted art creation, concept visualization, and rapid prototyping

Content Creation: Automated graphic design, social media content, and marketing materials

Medical Imaging: Image restoration, super-resolution, and synthetic data generation

Scientific Visualization: Data visualization, simulation results, and educational materials

Entertainment: Video game assets, film effects, and interactive media

Architecture & Design: Concept sketches to photorealistic renderings

← Back to All Projects