Assignment 2

Introduction

For this assignment, you will be implementing Structure from Motion.

We have made available a visualization tool using the Three.js library implemented in "./js/assignment2.js" and an example implementation located in "./assignments/assignment2.py". Your objective is to fill in TODOs in the python files and make modification based on it. You are encouraged to use a programming language with which you are comfortable. The output results should be in the ply format, and you must visualize your outcomes accordingly.

How to Submit: Please submit this template file along with your implementation as a zip file. The zip file should contain your source code, the generated results in PLY mesh format, and a report that has been modified using this HTML file. The report should comprise your results and a concise explanation of your implementation. Alternatively, you may choose to create a GitHub repository containing all these elements and provide a link for submission.

Requirements / Rubric: The grading is based on the correctness of your implementation. You are encouraged to use the visualization tool to debug your implementation. You can also use the visualization tool to test your implementation on other 3D models. Extract Credit: You are free to complete any extra credit:

Structure From Motion

Outline of the Implementation

This assignment on structure from motion requires the completion of the three TODOs left in the code. The basic outline of the structure from motion pipeline is in the SFM class. The pipeline goes as follows:

Step 0 (Feature Matching)
Feature extraction (SIFT) followed by matching (BFMatcher/Flann based matcher) is done previously and saved serialized into file that are used further down the pipeline.
Step 1 (Baseline Pose Estimation)
The pipeline begins by estimating the pose/transformation (rotation and translation) of the first two images in the dataset. This is done using fundamental matrix and essential matrix methods.
Step 2 (Baseline Triangulation)
Once the poses of the first two images are estimated, the pipeline performs triangulation to reconstruct 3D points from matched feature points between these images. This concludes the initialization step of the view.
Step 3 (First Point Cloud Generation)
A point cloud is generated after triangulating matched feature points. This forms the initial structure of the scene.
Step 4 (Pose Estimation and Triangulation for the remaining Images)
The pipeline iterates over the remaining images, estimating poses, triangulating new points, updating the 3D point cloud, and evaluating reprojection errors for each image. The steps are:
  1. Get pose (implementation provided)
  2. Get features (implementation provided)
  3. Find matches (implementation provided)
  4. Remove outliers (implemented using fundamental matrix method)
  5. Triangulate and merge the point clouds (similar to triangulating new view but only using the inliers)
After every round of triangulation, reprojection error is also calculated to quantify the performance of the estimation. images are also plotted to visualize the differences on a few samples. The three TODOs are all associated with calculating the reprojection error, visualizing and implementing triangulation in step 5. above. please refer to the sfm.py file for the exact implementation.

Results

In this section, I present the results from the algorithm for various datasets. Although the testing has been done across many datasets here I will be using only the following: fountain-P11 and Herz-Jesus-P8. The full dataset is available here.

Reprojection Error

I use different parameters for outlier detection for different subjects. outlier_thres which is internally mapped to ransacReprojThreshold when computing the fundamental matrix is different for different datasets. The values are: 0.9 for fountain-P11 and 0.4 for Herz-Jesus-P8.

fountain-P11 Herz-Jesus-P8
Image Name Reprojection Error Image Name Reprojection Error
0000 0.478 0000 0.182
0001 0.491 0001 0.224
0002 0.596 0002 0.832
0003 0.362 0003 0.905
0004 2.720 0004 5.986
0005 1.930 0005 12.256
0006 4.127 0006 9.549
0007 8.172 0007 16.798
0008 11.433 -- --
0009 9.627 -- --
0010 11.928 -- --
Mean Error: 4.715 Mean Error: 5.841
Time Taken: 2.568s Time Taken: 1.4s

Reprojection Point Error Estimates

Here are some camera pose estimates from selected images from the datasets. The full output is saved in the assets directory.

fountain-P11 Herz-Jesus-P8
Image Name Image Image Name Image
0000 image 0 0000 image 0
0005 image 5 0004 image 4
0010 image 10 0007 image 7

Point Cloud Visualization (and Comparison with Colmap)

Results from my implementation:
fountain-P11 Herz-Jesus-P8
View Number Point Cloud View Number Point Cloud
cloud_2_view
cloud_2_view
cloud_7_view
cloud_5_view
cloud_11_view
cloud_8_view
Results from Colmap:
Running the two datasets in colmap generate the following point clouds. These point clouds are more richer and accurate to the image dataset provided.

fountain-P11 (Colmap) Herz-Jesus-P8 (Colmap)
Final Point Cloud

Extra Credit

I collected my own dataset for this experiment. It can be found here. The images need to be first exported to a png or jpg format. I also had to update the code in several places to make this work. Such as changing the matcher to be a flann based matcher and changes the parameters for the matcher and fundamental matrix calculation to make this work. Here is a a visualization of the camera reprojection error:

error image 1 error image 2


As you can see the error is quite poor and the pointclouds are not coherent. The problem in this case lies first with feature detection as none of them lie on the baxter robot. Secondly, I think bundle adjustment could prove very useful here to make the process more accurate.