Stereo Vision and 3D Reconstruction

Classical Computer Vision Methods
Self-Directed Study, August 2025

Implement photometric stereo and plane sweep stereo algorithms for 3D scene reconstruction.

This project explores classical computer vision techniques for recovering 3D surface properties from 2D images. Photometric stereo uses varying illumination to estimate surface normals and albedo, while plane sweep stereo uses multiple viewpoints to compute depth maps through normalized cross-correlation matching.

Stereo Vision Pipeline Overview
Complete 3D reconstruction pipeline: from input images to surface normals, depth maps, and final 3D meshes

Project Details

Completed: August 2025
Implementation: Python with NumPy, OpenCV, Open3D
Datasets: Harvard Photometric Stereo, Middlebury Stereo
Key Concepts: Multi-view geometry, surface reconstruction, mesh generation

Overview

Stereo vision encompasses multiple approaches to 3D scene reconstruction, each exploiting different image formation principles. This project implements two complementary techniques: photometric stereo for detailed surface analysis and geometric stereo for spatial structure recovery.

The complete pipeline processes input images through photometric and geometric analysis to produce comprehensive 3D reconstructions including surface normals, albedo maps, depth estimates, and final mesh models.

Photometric Stereo

Photometric stereo recovers surface properties by analyzing how appearance changes under different lighting conditions. The method assumes Lambertian reflectance and uses least squares optimization to solve for surface normals and albedo.

Lambertian Reflectance Model

For Lambertian surfaces, observed intensity equals surface albedo times the dot product of surface normal and light direction:

Lambertian Reflectance Mathematics

Algorithm Implementation

The implementation solves a linear system for each pixel using multiple illumination observations:

  1. Data Setup: Load N images under known illumination directions
  2. Linear System: For each pixel, construct system I = ρ(n·l)
  3. Least Squares: Solve overdetermined system for albedo-normal product
  4. Decomposition: Extract albedo magnitude and unit normal direction
  5. Coordinate Frame: Ensure normals follow RGB color encoding (R=+X, G=+Y, B=+Z)

Photometric Stereo Results

Input Images Under Different Lighting
Input: Multiple Illuminations
Same viewpoint, 9 different lighting directions
Recovered Surface Normals
Recovered Surface Normals
RGB encoding: R=+X, G=+Y, B=+Z directions
Surface Albedo Map
Surface Albedo
Intrinsic surface reflectance properties
Cat Normal Map
Cat Dataset Results
Real photometric stereo reconstruction

Plane Sweep Stereo

Plane sweep stereo estimates scene depth by systematically testing different depth hypotheses and finding the best stereo correspondences using normalized cross-correlation matching.

Camera Projection and Epipolar Geometry

3D points are projected to 2D image coordinates using calibrated camera parameters:

Camera Projection Mathematics

Normalized Cross-Correlation Matching

Correspondence quality is measured using normalized cross-correlation between image patches:

NCC Matching Mathematics

Plane Sweep Algorithm

The algorithm systematically tests depth hypotheses by projecting one image to the other's viewpoint:

  1. Calibration: Load camera intrinsics and extrinsics for stereo pair
  2. Depth Sampling: Define range of depth hypotheses to test
  3. Image Projection: For each depth, project left image to right viewpoint
  4. NCC Computation: Calculate normalized cross-correlation between projected and actual right image
  5. Cost Volume: Build 3D array of matching costs (H×W×D)
  6. Depth Selection: Choose depth with highest correlation at each pixel

Plane Sweep Stereo Results

Stereo Input Image Pair
Stereo Image Pair
Calibrated cameras with known baseline
Plane Sweep Animation
Plane Sweep Process
Animation showing depth hypothesis testing
NCC Cost Volume
NCC Cost Volume
Correlation scores across depth layers
Final Depth Map
Computed Depth Map
White=near, Black=far depth estimates

3D Mesh Reconstruction

The final stage converts 2D analysis results into complete 3D mesh models through surface integration and point cloud reconstruction techniques.

Surface Integration

Surface normals are integrated to produce depth maps using the Frankot-Chellappa algorithm, which enforces integrability constraints in the frequency domain.

Mesh Generation Pipeline

Point clouds are converted to smooth meshes using Poisson surface reconstruction:

  1. Point Cloud Generation: Unproject depth values to 3D coordinates using camera intrinsics
  2. Normal Estimation: Compute point normals from local surface orientation
  3. Poisson Reconstruction: Generate smooth mesh surface from oriented point cloud
  4. Texture Mapping: Project original images for photorealistic rendering

3D Reconstruction Results

Mesh from Normals
Mesh from Photometric Normals
High surface detail from lighting analysis
Mesh from Depth
Mesh from Stereo Depth
Accurate global geometry from multi-view
Combined 3D Reconstruction
Final 3D reconstruction combining photometric detail with geometric accuracy

Implementation Results

Technical Achievements

Successfully implemented all three components of the stereo vision pipeline, achieving accurate surface normal estimation, depth map computation, and high-quality 3D mesh reconstruction on multiple datasets.

Performance Metrics

Implementation Details