Assignment 2

Introduction

For this assignment, you will be implementing Structure from Motion.

Feature Matching
Structure from Motion

We have made available a visualization tool using the Three.js library implemented in "./js/assignment2.js" and an example implementation located in "./assignments/assignment2.py". Your objective is to fill in TODOs in the python files and make modification based on it. You are encouraged to use a programming language with which you are comfortable. The output results should be in the ply format, and you must visualize your outcomes accordingly.

How to Submit: Please submit this template file along with your implementation as a zip file. The zip file should contain your source code, the generated results in PLY mesh format, and a report that has been modified using this HTML file. The report should comprise your results and a concise explanation of your implementation. Alternatively, you may choose to create a GitHub repository containing all these elements and provide a link for submission.

Requirements / Rubric: The grading is based on the correctness of your implementation. You are encouraged to use the visualization tool to debug your implementation. You can also use the visualization tool to test your implementation on other 3D models.

+80 pts: Implement the structure-from-motion algorithm with the start code.
+20 pts: Write up your project, algorithms, reporting results (reprojection error) and visualisations, compare your reconstruction with open source software Colmap.
+10 pts: Extra credit (see below)
-5*n pts: Lose 5 points for every time (after the first) you do not follow the instructions for the hand in format

Extract Credit: You are free to complete any extra credit:

up to 5 pts: Present results with your own captured data.
up to 10 pts: Implement Bundle Adjustment in incremental SFM.
up to 10 pts: Implement multi-view stereo (dense reconstruction).
up to 20 pts: Create mobile apps to turn your SFM to a scanner.
up to 10 pts: Any extra efforts you build on top of basic SFM.

Structure From Motion

Outline of the Implementation

This assignment on structure from motion requires the completion of the three TODOs left in the code. The basic outline of the structure from motion pipeline is in the SFM class. The pipeline goes as follows:

Step 0 (Feature Matching): Feature extraction (SIFT) followed by matching (BFMatcher/Flann based matcher) is done previously and saved serialized into file that are used further down the pipeline.
Step 1 (Baseline Pose Estimation): The pipeline begins by estimating the pose/transformation (rotation and translation) of the first two images in the dataset. This is done using fundamental matrix and essential matrix methods.
Step 2 (Baseline Triangulation): Once the poses of the first two images are estimated, the pipeline performs triangulation to reconstruct 3D points from matched feature points between these images. This concludes the initialization step of the view.
Step 3 (First Point Cloud Generation): A point cloud is generated after triangulating matched feature points. This forms the initial structure of the scene.
Step 4 (Pose Estimation and Triangulation for the remaining Images): The pipeline iterates over the remaining images, estimating poses, triangulating new points, updating the 3D point cloud, and evaluating reprojection errors for each image. The steps are:

Results

In this section, I present the results from the algorithm for various datasets. Although the testing has been done across many datasets here I will be using only the following: fountain-P11 and Herz-Jesus-P8. The full dataset is available here.

Reprojection Error

I use different parameters for outlier detection for different subjects. outlier_thres which is internally mapped to ransacReprojThreshold when computing the fundamental matrix is different for different datasets. The values are: 0.9 for fountain-P11 and 0.4 for Herz-Jesus-P8.

fountain-P11		Herz-Jesus-P8
Image Name	Reprojection Error	Image Name	Reprojection Error
0000	0.478	0000	0.182
0001	0.491	0001	0.224
0002	0.596	0002	0.832
0003	0.362	0003	0.905
0004	2.720	0004	5.986
0005	1.930	0005	12.256
0006	4.127	0006	9.549
0007	8.172	0007	16.798
0008	11.433	--	--
0009	9.627	--	--
0010	11.928	--	--
Mean Error: 4.715		Mean Error: 5.841
Time Taken: 2.568s		Time Taken: 1.4s

Reprojection Point Error Estimates

Here are some camera pose estimates from selected images from the datasets. The full output is saved in the assets directory.

fountain-P11		Herz-Jesus-P8
Image Name	Image	Image Name	Image
0000		0000
0005		0004
0010		0007

Point Cloud Visualization (and Comparison with Colmap)

Results from my implementation:

fountain-P11		Herz-Jesus-P8
View Number	Point Cloud	View Number	Point Cloud
cloud_2_view		cloud_2_view
cloud_7_view		cloud_5_view
cloud_11_view		cloud_8_view

Results from Colmap:

Running the two datasets in colmap generate the following point clouds. These point clouds are more richer and accurate to the image dataset provided.

fountain-P11 (Colmap)	Herz-Jesus-P8 (Colmap)
Final Point Cloud

Extra Credit

I collected my own dataset for this experiment. It can be found here. The images need to be first exported to a png or jpg format. I also had to update the code in several places to make this work. Such as changing the matcher to be a flann based matcher and changes the parameters for the matcher and fundamental matrix calculation to make this work. Here is a a visualization of the camera reprojection error:

As you can see the error is quite poor and the pointclouds are not coherent. The problem in this case lies first with feature detection as none of them lie on the baxter robot. Secondly, I think bundle adjustment could prove very useful here to make the process more accurate.