Assignment 3

Introduction

For this assignment, you will be training your own NeRF or Gaussian Splatting.

We have made available a visualization tool using the Three.js library implemented in "./js/assignment3.js" and an example NerfStudio Colab can be found here. You have the option to launch the demo either through Colab or on your local machine. It is important to note that while Colab offers free TPU access, completing the training within the free tier may not be feasible. In this assignment, your task is to collect your own data, conduct calibration, and train your NeRF/3DGS model. You are welcome to use any programming language you are comfortable with. Ensure that the final results are exported in obj/ply format for visualization. You should then evaluate your results by comparing them with solutions from Mesh Room or COLMAP.

How to Submit: Please submit this template file along with your implementation as a zip file. The zip file should contain your source code, the generated results in PLY mesh format, and a report that has been modified using this HTML file. The report should comprise your results and a concise explanation of your implementation. Alternatively, you may choose to create a GitHub repository containing all these elements and provide a link for submission.

Requirements / Rubric: The grading is based on the correctness of your implementation. You are encouraged to use the visualization tool to debug your implementation. You can also use the visualization tool to test your implementation on other 3D models.

Extract Credit: You are free to complete any extra credit:

NeRF

The trained model can be found in this link.

NOTE: Allow a few seconds for this page to load as it loads large point clouds.

For this assignment, I train a NeRF model using nerfstudio. The training process takes about 10 minutes as compared to 2 hours and 12 minutes for colmap dense reconstruction.

Training Dataset (Video)

I prepared a dataset using a GoPro Hero 9 camera (1080p standard video at 30 fps).


Training Output Metrics

Below are the training metrics for the model evaluated on a set of 32 images:
Metric Value
PSNR (Peak Signal to Noise Ratio) 25.66154670715332
PSNR (Peak Signal to Noise Ratio) Std 1.937759518623352
SSIM (Structural Similarity Index Measure) 0.8653780221939087
SSIM (Structural Similarity Index Measure) Std 0.03032178431749344
LPIPS (Learned Perceptual Image Patch Similarity) 0.09439089894294739
LPIPS (Learned Perceptual Image Patch Similarity) Std 0.013973996974527836
Render Performance Metrics
Num Rays Per Sec 727773.875
Num Rays Per Sec Std 37743.16015625
Frames Per Sec 1.403884768486023
Frames Per Sec Std 0.07280702888965607

Training Output Video

I add some keyframes with jerky and abrupt angle changes and render a video at 60 fps. I get the following result.



In this video the floor and the ceiling are not rendered correctly since the dataset contains no appropriate examples for them.

Training Output Mesh

The exported mesh (cropped) is shown below. It preserves a lot of the original details in the scene when compared with the colmap output.

Colmap Output (Takes time to load)

The mesh generated by colmap using dense stereo reconstruction is quite noisy. The details in the scene are also relatively inaccurate when compared with the output of the nerf model.

Extra Credit (LERF)

For the extra credit parts, I train a LERF model. I am using the lerf-lite variant since my desktop doesn't meet the required specs for training the bigger models. I use the same scene as my nerf model but populate it with common daily items such as a mug, a bottle, monitors and a banana. The model is able to embed these items in the trained NeRF output. The trained model can be found at this link.

Training Output (Relevancy Map)

The output is show below.

Scene in RGB Scene in RGB
Relevancy map of Banana Placeholder Image
Relevancy map of Bottle Placeholder Image
Relevancy map of Monitor Placeholder Image
Relevancy map of Mug Placeholder Image

Training Video (LERF)





Note: I couldn't generate the training metrics for this part since the model itself occupies most of the GPU memory (11488MiB / 12282MiB) and fails with the following error:

torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 768.00 MiB. GPU 0 has a total capacity of 11.71 GiB of which 194.31 MiB is free. Including non-PyTorch memory, this process has 11.06 GiB memory in use. Of the allocated memory 8.27 GiB is allocated by PyTorch, and 1.75 GiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF