MAEG5720 Computer Vision in Practice
Mini-Project2
Aim: The field of view of the image is limited by your lens. However, we could solve this issue by
combining multiple images together to form a panorama. The aim of this assignment is to
automatically stitch the images acquired by a panning camera.
Image 1 Image 2
Stitched Image
The Algorithm outline:
MAEG5720 Computer Vision in Practice
-
Choose one image as the reference frame
-
Estimate the homography between reference image and the other image
-
- Detect the local features and extract the feature descriptor for each feature point in both images
-
- Matches the feature descriptors between two images
-
- Robustly estimate homography using RANSAC
-
- Warp the image using the homography into the reference frame.
Tips and detailed description of the algorithm:
(1) Choose the reference image: Since we are working with two images, you are free to choose thereference frame from any image.
Possible extension :In case you would like to extend this algorithm to multiple images in your spare time, you can choose the middle frame as the reference frame to minimize the distortion of the images.
-
Also, to ensure you have larger area of overlapping, you can use chain homographies. For
separate it into two steps. i.e. H = H ∗ H 24 34 23
example: If you wish to calculate the homography between image 2 to image 4, you can
(2) Estimation of homography
(a) Detecting local features in each image
To extract the local features of the image, one of the common methods is to use SIFT descriptor. Since SIFT descriptor is a patented technology and it is not available in MATLAB, please download the library from VLFeat.org (https://www.vlfeat.org/download.html) and follow the instruction on the installation page (https://www.vlfeat.org/install-matlab.html).
Once the setup is done, you can use the following code for SIFT descriptor extraction.
I = single(rgb2gray(I));
[f,d] = vl_sift(I) ; where
please refer to the Shape Descriptor lecture notes for detail.
f is a 4 × 𝑛𝑛𝑛𝑛𝑛𝑛_𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑 matrix consists of 𝑥𝑥, 𝑦𝑦, 𝜎𝜎, 𝜃𝜃 of each SIFT point d is a 128 × 𝑛𝑛𝑛𝑛𝑛𝑛_𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑 matrix
MAEG5720 Computer Vision in Practice
(b) Matching the descriptors between images
Once the SIFT descriptors are extracted, you have to find the pairs of descriptors between images that looks similar. This will probably be corresponding each others. One of the method to do is to compare the sum of square distance between two descriptors and adopt the near neighbour distance ratio (NNDR) of 1.5
𝑁𝑁𝑁𝑁𝑁𝑁𝑁𝑁 = 𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑 𝑜𝑜𝑜𝑜 𝑆𝑆𝑑𝑑𝑑𝑑𝑜𝑜𝑑𝑑𝑑𝑑 𝑀𝑀𝑑𝑑𝑑𝑑𝑑𝑑h 𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑 𝑜𝑜𝑜𝑜 𝐹𝐹𝑑𝑑𝐹𝐹𝑑𝑑𝑑𝑑 𝑀𝑀𝑑𝑑𝑑𝑑𝑑𝑑h
The pseudo code below gives you some hints on the algorithm outline:
function [matches scores] = match_descriptor(d1, d2)
%Define the threshold (for example 1.5)
Num_of_matches =0
for each i in d1 for j each d2
find the best match and second best match if NNDR>threshold
add the i and j into matches[num_of_match]
add the 𝑆𝑆𝑆𝑆𝑁𝑁(𝑑𝑑𝑑𝑑, 𝑑𝑑𝑝𝑝)score into scores[num_of_match]
num_of_match=num_of_matches+1
end
end end
return matches and scores
(c) Robust Estimation of the homography by RANSAC
Random samples of 4-points to compute each homography hypothesis. You will need to write the following function
𝐻𝐻 = 𝐹𝐹𝑑𝑑𝑛𝑛𝑑𝑑𝐻𝐻𝑑𝑑𝑛𝑛𝑑𝑑𝐹𝐹𝑑𝑑𝐹𝐹𝑑𝑑h𝑦𝑦(𝑥𝑥1, 𝑦𝑦1, 𝑥𝑥2, 𝑦𝑦2, 𝑥𝑥3, 𝑦𝑦3, 𝑥𝑥4, 𝑦𝑦4, 𝑥𝑥𝑑𝑑1, 𝑦𝑦𝑑𝑑1, 𝑥𝑥𝑑𝑑2, 𝑦𝑦𝑑𝑑2, 𝑥𝑥𝑑𝑑3, 𝑦𝑦𝑑𝑑3, 𝑥𝑥𝑑𝑑4, 𝑦𝑦𝑑𝑑4)
Which takes the 4 SIFT points from image 1 and 4 corresponding SIFT points from image 2. In order to compute the elements of 𝐻𝐻, You will need to setup the linear system of equations (i.e. Ah = 0). There are totally 8 elements for the 3x3 homography matrix. Each pair of
MAEG5720 Computer Vision in Practice
correspondence gives you 2 equations and therefore you need totally 4 pairs of correspondences. The A matrix is of the form below:
A=[
-x1 -y1 -1 0 0 0 x1*xp1 y1*xp1 xp1;
0 0 0 -x1 -y1 -1 x1*yp1 y1*yp1 yp1; :
-x4 -y4 -1 0 0 0 x4*xp4 y4*xp4 xp4;
0 0 0 -x4 -y4 -1 x4*yp4 y4*yp4 yp4];
The solution can be obtained using SVD of A. The solution is the right singular vector corresponding to the least singular value.
[U,S,V] = svd(A);
H=V(:,end);
H=reshape(H,3,3)’;
Please refer to Lecture note 9 for more detail.
For the RANSAC, you can just use a very simple implementation with a fixed number of sampling iteration is sufficient. You should output a single homography matrix H with the most number of inliners. The pseudo code for the RANSAC is given below:
function [bestH, num_of_inliners] = RANSAC ( f1, f2, matches)
define the parameters (PixelError, num_Iteration, num_pts_cal_H=4)
max_inliner =0;
for each i in num_Iteration
random select 4 points and compute H_hypothesis with 𝐹𝐹𝑑𝑑𝑛𝑛𝑑𝑑𝐻𝐻𝑑𝑑𝑛𝑛𝑑𝑑𝐹𝐹𝑑𝑑𝐹𝐹𝑑𝑑h𝑦𝑦 function
inliner =inliner+1
reset inliner ;
for each pair in the matches list
find_out the coordinate for each matches 𝑃𝑃1 𝐹𝐹𝑛𝑛𝑑𝑑 𝑃𝑃
end
end
MAEG5720 Computer Vision in Practice
if inliner>max_inliner
bestH = H_hypothesis
number_of_inliners = inliner
end
end
(d) Warp the image using the best homography into the reference frame
Once you obtained the best homography, the last step is to warp the image and stich the image together. You will need to extend the images in the reference frame to accommodate the other image. One of the methods is to use padarray function in MATLAB to pad zeros into the reference frame.
Then we use backward warping projection. This could avoid holes in the warped image. For each pixel in the reference image, calculate the corresponding pixel location in the second image using the best homography obtained by the RANSAC function. You can compare the value of the wrapped pixel with the corresponding value in the other image and simply pick the pixel value with maximum value from both images. This tends to produce less artifacts than taking the average of the warped images.
MAEG5720 Computer Vision in Practice
function stitchedImage = stitch(im1, im2, homography)
copy the reference image (im1) to the stitchedImage
pad the stitchedImage with zeros
for each i column in stitchedImage)
for each j row in stitchedImage
calculate the corresponding position in im2 using homography
p2 = homography* p1
if (p2 is within range of stitchedimage)
Take the higher intensity out of the p1 and p2 as the pixel value end
end; end;
Crop the image to remove extra boundary which are black(pixel value=0)
MAEG5720 Computer Vision in Practice
The final result should look like this.
The stitched Image
Bonus Question: (20%)
You could use the above algorithm to stitch more than two images. Please select your own three or more images for stitching and select the best reference image. Please use another main_multiple.m file for this bonus question for clarity issue.
The stitched Image (3 images)
What to submit:
MAEG5720 Computer Vision in Practice
Please submit a compress file with the following:
-
- main.m for the main function and read in the image for processing. (20%)
-
- main_multiple.m for the main function and read image for processing multiple
images(20%) -bonus
-
- match_descriptor.m for matches the descriptors (20%)
-
- FindHomography.m for solving the homography using 4 correspondences (20%)
-
- RANSAC.m for robust estimation of the best homography between two images (20%)
-
- Stitch.m for stitching the images together.(20%)
-
- A pdf file for the result and discussion you would like to share.
-
- Images you are using.
Please ensure your program is executable! Also compress your files into a zip file and use your student id as the name.
You will have two weeks for this assignment and tutorial will be arranged to help you do this assignment. Good luck!