CS5670 Project 4: Neural Radiance Fields (NeRF)

Build Neural Radiance Fields (NeRF) with a set of images for novel view synthesis.

This project implements the groundbreaking NeRF architecture from ECCV 2020, learning to represent 3D scenes as continuous neural radiance fields. Using only 2D images as supervision, we train deep networks to synthesize photorealistic novel views from arbitrary camera positions.

360° novel view synthesis: NeRF learns to render photorealistic views from any camera angle

Project Details

Assigned:	Thursday, March 20, 2025
Due:	Friday, March 28, 2025
Group Size:	2 students
Platform:	Google Colab with PyTorch
Key Concepts:	Neural networks, ray tracing, volume rendering, positional encoding

Overview

Neural Radiance Fields (NeRF) represent a paradigm shift in 3D scene representation and novel view synthesis. Unlike traditional 3D reconstruction methods that build explicit geometric models, NeRF learns an implicit continuous volumetric representation using neural networks.

The key insight is to represent a scene as a function that maps 3D coordinates and viewing directions to color and volume density, then use differentiable volume rendering to train this function from 2D images alone.

NeRF Architecture and Pipeline

Complete NeRF Pipeline

Camera Ray Generation

Compute ray origins and directions through image pixels

3D Point Sampling

Sample points along rays in 3D space

Positional Encoding

Encode coordinates with high-frequency functions

Neural Network

Predict color and density for each point

Volume Rendering

Composite colors along rays to form images

Implementation Details

Part 1: Positional Encoding

Positional encoding enables neural networks to learn high-frequency functions by mapping coordinates to higher-dimensional spaces using trigonometric functions.

Positional Encoding Mathematics:

Key Insight: Without positional encoding, MLPs exhibit a spectral bias toward low-frequency functions, leading to oversmoothed outputs that cannot capture fine details.

No Positional Encoding

Oversmoothed, lacks detail

PE Frequency = 3

Improved detail capture

PE Frequency = 6

High-frequency details preserved

Part 2: Ray Tracing and 3D Sampling

For each pixel in the target image, we generate a camera ray and sample points along it to query the neural radiance field.

Camera Ray Generation:

Pixel to Camera: Convert pixel coordinates to camera coordinate system
Ray Origin: Camera center in world coordinates
Ray Direction: Unit vector from camera center through pixel
World Transform: Apply camera-to-world transformation matrix

Ray Equation:

3D Point Sampling:

Sample points along each ray using stratified sampling to ensure good coverage of the 3D space while maintaining differentiability.

Part 3: Neural Network Architecture

NeRF Network Design:

The NeRF MLP takes 5D input (3D position + 2D viewing direction) and outputs 4D (RGB color + volume density).

Network Structure:

Position Encoding: 3D coordinates → high-dimensional encoding
Density Branch: Position → volume density σ
Color Branch: Position + viewing direction → RGB color
Skip Connections: Improve gradient flow for deep networks

Part 4: Volume Rendering

Volume rendering composites the colors and densities along each ray to produce the final pixel color using the classic volume rendering equation.

Volume Rendering Equation:

Compositing Weights:

Physical Interpretation: The volume density σ represents how much light is absorbed at each point, while the compositing weights determine how much each sample contributes to the final pixel color.

PyTorch Implementation

Deep Learning Framework

This project provided hands-on experience with PyTorch for computer vision and 3D deep learning:

Tensor Operations: Efficient batch processing of rays and 3D points
Automatic Differentiation: Gradient computation through the volume rendering process
GPU Acceleration: Training on Google Colab's GPU infrastructure
Custom Loss Functions: Photometric reconstruction loss between rendered and target images
Model Optimization: Adam optimizer with learning rate scheduling

My Results

Training Results and Metrics

Successfully trained NeRF on the provided scene with convergence to high-quality novel view synthesis.

Training Performance:

Iterations: 1000-3000 (10-30 minutes on GPU)
Final PSNR: >20 dB (target quality threshold)
Loss Function: L2 photometric reconstruction
Optimizer: Adam with learning rate decay

Training Progression

100 Iterations

Early training - blurry reconstruction

1000 Iterations

Converged model - sharp details

Novel View Synthesis Results

Novel View 1

Photorealistic rendering from unseen angle

Novel View 2

Consistent geometry and lighting

Depth Maps and 3D Understanding

Learned Depth Map

NeRF implicitly learns scene geometry

Volume Density Field

3D structure representation

360° Video Synthesis

Complete 360° Novel View Synthesis

Frame sequence from 360° rotation around the scene

Video Generation Process: Generate camera poses in a circular path around the object, render images from each viewpoint using the trained NeRF, and composite into a smooth 360° video.

Video Specifications:

Resolution: 800×800 pixels
Frame Count: 40 frames (360° rotation)
Render Time: ~2-3 seconds per frame
Camera Path: Circular orbit around scene center

Extra Credit: Custom Dataset

Training NeRF on Personal Photography

Custom Input Images

LLFF-style forward-facing capture

Novel View Results

Synthesized views of personal scene

Data Capture Process: Following LLFF methodology for forward-facing scenes, using COLMAP for camera pose estimation, and training NeRF with custom photographs to demonstrate real-world applicability.

Key Learnings

Computer Vision and Deep Learning Concepts

Neural Implicit Representations: Understanding how neural networks can represent continuous 3D functions
Volume Rendering: Classical computer graphics techniques applied to neural representations
Differentiable Rendering: Making graphics pipelines trainable with gradient descent
High-Frequency Modeling: Positional encoding's role in capturing fine details
3D Scene Understanding: Learning geometry and appearance from 2D supervision alone

Technical Skills Developed

PyTorch Proficiency: Deep learning framework for computer vision applications
3D Mathematics: Camera models, ray tracing, and coordinate transformations
GPU Computing: Efficient tensor operations and memory management
Optimization Techniques: Training strategies for neural implicit functions
Performance Analysis: PSNR metrics and convergence evaluation

Impact and Applications

Revolutionary Applications

Virtual Reality: Photorealistic VR environments from simple photo captures

Film and Media: Novel view synthesis for cinematography and special effects

3D Content Creation: Democratizing 3D modeling through neural representations

Robotics: 3D scene understanding for navigation and manipulation

Cultural Preservation: Digital documentation of historical sites and artifacts

Medical Imaging: 3D reconstruction from sparse medical scans

← Back to All Projects