Image Processing Portfolio

Project Overview

In this project, I experiment with Image Homography - projecting an image on another plane surface that is not the image plane. An alternative name for this Perspective Projection Transformation. Finally, I will take four groups of three photographs (all at the same Center of Projection) and create an image mosaic (similar to an image panaroma taken on an iPhone) by registering, projective warping, resampling, and compositing them.

In the later parts of this project, we detecting corner features in an image, extracting a Feature Descriptor for each feature point, match these feature descriptors between two images, use a robust method (RANSAC) to compute a homography and then proceed as in the first part to produce a mosaic.

Part 1: Shoot the Pictures

In this step, I use the trusted iPhone to click pictures. I decided to take four groups of three pictures each which I will warp and mosaic together later in this report. The most important technique that I employed during my photo-taking endeavour was to make sure that the lens of my phone camera (serving as the center of projection) always remained the same. I simply rotate the phone about an axis line passing through the center of projection. The images of the Church were taken by my brother through who studies at Purdue. Here are the original images.

University Church Building at Purdue - Image 1

University Church Building at Purdue - Image 2

>University Church Building at Purdue - Image 3

Part 2: Recover Homographies

The next step involved calculating the homography matrices between pairs of images to recover the transformations required to warp one image into another. I implemented a function called computeH which uses point correspondences between two images to compute a homography matrix.

The homography matrix was used to warp images into the same perspective, which allowed for the subsequent blending step to create a seamless mosaic.

Mathematical Explanation

To compute the homography matrix, we use a set of corresponding points between the source and destination images. Let's denote the source points as \( (x, y) \) and the destination points as \( (x', y') \). The goal is to find a transformation matrix \( H \) such that:

\[ \begin{bmatrix} w'x' \\ w'y' \\ w' \end{bmatrix} = H \cdot \begin{bmatrix} x \\ y \\ 1 \end{bmatrix} \]

The homography matrix \( H \) is a 3x3 matrix that relates the coordinates in one image to the coordinates in the other image:

\[ H = \begin{bmatrix} a & b & c \\ d & e & f \\ g & h & 1 \end{bmatrix} \]

To find \( H \), we need to set up a system of linear equations using multiple point correspondences. For each pair of corresponding points, we get two equations:

\[ \begin{cases} ax + by + c = wx' \\ dx + ey + f = wy' \\ gx + hy + 1 = w \end{cases} \implies \begin{cases} ax + by + c = (gx + hy + 1) x' \\ dx + ey + f = (gx + hy + 1) y' \end{cases} \implies \begin{cases} ax + by + c - gx x' - hy x' = x' \\ dx + ey + f - gx y' - hy y' = y' \end{cases} \]

Rearranging these equations gives us a system of equations that can be represented in matrix form:

\[ \begin{bmatrix} x & y & 1 & 0 & 0 & 0 & -xx' & -yx' \\ 0 & 0 & 0 & x & y & 1 & -xy' & -yy' \end{bmatrix} \begin{bmatrix} a \\ b \\ c \\ d \\ e \\ f \\ g \\ h \end{bmatrix} = \begin{bmatrix} x' \\ y' \end{bmatrix} \]

\[ A \mathbf{h} = \mathbf{b} \]

Where \( A \) is a matrix that depends on the source and destination coordinates, \( \mathbf{h} \) is a vector containing the unknown elements of the homography matrix, and \( \mathbf{b} \) is a vector of destination coordinates. Given enough correspondences, we can solve this system of equations for \( \mathbf{h} \) using a least squares approach.

Least Squares Solution

To solve for the homography matrix elements, we need at least four pairs of corresponding points, which provides eight equations. More correspondences provide an overdetermined system, which we can solve using the least squares method. The least squares solution minimizes the error between the predicted destination coordinates and the actual coordinates.

The least squares solution can be found by solving the following equation:

\[ \mathbf{h} = (A^T A)^{-1} A^T \mathbf{b} \]

This equation provides the best fit for the homography matrix by minimizing the squared differences between the actual and projected points.

Part 3: Warp the Images

Using the computed homography matrix, I applied inverse warping to transform the images into a common perspective. The inverseWarpImage function was used to project an image onto the final canvas and align it properly. The results of the individual warped images were visualized as intermediate steps.

Warped University Church Purdue Image 1 from 2

Part 4: Blend the Images into a Mosaic

After aligning the images, the next step was to blend them together seamlessly. I used a distance transform-based blending technique to ensure smooth transitions between overlapping images. The distance transform values were normalized and used as blending weights, providing a feathered transition across the overlapping areas. Here is an example of the final mosaic.

The distance transform sets each pixel inside a region (here, inside a projected image's bounding box) to be the distance to the nearest edge. We'll be using Euclidean distance as the measurement, and the final result is normalized to range from 0 to 1.

These are the normalized distance transforms for the images of the view from the patio

Normalized 0-1 distance transform of Patio 1 image

Normalized 0-1 distance transform of Patio 2 image

Normalized 0-1 distance transform of Patio 3 image

One note on implementation detail: My algorithm of generating the canvas of the final stitched image was as follows: I first figure out what is the eventual extent, what I call as min_x_mosaic, min_y_mosaic, max_x_mosaic, max_y_mosaic. Then I place each image on the final canvas. So now I have my three images on the same final canvas dimensions. Now doing weighted average with three images with same dimension (the final canvas dimension) is relatively easier.

Creating the Mosaic of the view from my Patio

First (Warped) Image of Patio on final canvas

Third (Warped) Image of Patio on final canvas

Patio Mosaic — Final Mosaic of Patio Images

Creating the Mosaic of the view of the University Church Purdue

University Church Purdue - Source Image 1

University Church Purdue - Source Image 2

University Church Purdue - Source Image 3

First (Warped) Image of University Church Purdue on final canvas

Second Image of University Church Purdue on final canvas

Third (Warped) Image of University Church Purdue on final canvas

Creating the Mosaic of my apartment rooftop

First (Warped) Image of Rooftop on final canvas

Third (Warped) Image of Rooftop on final canvas

Rooftop Mosaic — Final Mosaic of Rooftop Images

Creating the Mosaic of the Mansion

First (Warped) Image of Mansion on final canvas

Third (Warped) Image of Mansion on final canvas

Mansion Mosaic — Final Mosaic of Mansion Images

Part 6: Harris Corner Detection

In the first step to achieve automatic feature mapping, we need to find the interest points. The Harris Corner detector focusses on finding the "corners" of an image. Below is an example of the Harris interest points for two of the images of the view from my patio. Key points to note: I updated the provided Harris Corner Detection - I updated the minimum distance between corners from 1 to 2 (so that the corners are not cluttered) and I added a threshold parameter to filter the corners based on the h value. This threshold is like a hyperparameter and needs to be changed for each image. For images of my patio, I chose the threshold to be 100,000. In addition, all the image processing done in this part and the ones ahead will be done on grey-scaled versions of the previous images as we do not multiple channels to compute feature descriptors and correspondences.

Part 8: Feature Descriptor Extraction

Once we have interest points selected for each of our images, we need a representation for them in order to accurately match them with their corresondences in the warped image space. In order to do this, I first build a gaussian pyramid (not stack) for each image, and then sample 40x40 patches centred at the interest points taken from Part 7 but at pyramid level 2 (with coordinates approproately recalculated). For reference, the original image is at Level 0.

Once we have 40x40 image patches, I subsample (pick one in every 5 pixels) to get 8x8 patch representations. Finally, I add normalize these patches by subtracting the mean pixel intensity and dividing by the standard deviation pixel intensity for each pixel in the 8x8 patch. Here is an example of a feature descriptor for an image of the view from my patio.

An example of a feature descriptors I created for my first interest point of patio image 1

Unfiltered list of all feature descriptors of all my feature descriptors for patio 1

Part 10: RANdom SAmple Consensus (RANSAC)

Although Lowe's trick does reduce the number of outliers, there will still inevitably be several incorrect pairs remaining. The issue with these incorrect pairs in the correspondence points is that least-squares is not very robust against outliers; a single pair of incorrect points can skew the least-squares result wildly, which also skews the computed homography.

To increase the robustness of least-squares when finding the desired homography, we can use RANSAC; we randomly sample 4 pairs of points, and compute the exact homography through these points, keeping track of the inliers (points that are correctly transformed by the homography) as we go. At the very end, we keep the homography that produces the most inliers. This ignores the correspondence points that do not follow the majority transformation, increasing the robustness against outliers. Below is an illustration of the result of performing RANSAC on the matched features. In my experience through this project, RANSAC almost always gave me accurate feature maps.

This could be because the RANSAC threshold I use is very small (0.8), so I only am left with accurate feature maps.

Features selected from the first patio image

Corresponding features selected from the second patio image

Bells and Whistles - Rotation Invariance

To make feature descriptors rotation-invariant, we align each 8x8 patch according to the dominant gradient direction at the interest point. Below, we outline the mathematical steps.

1. Calculating the Gradient

Given an interest point at coordinates \( (x, y) \), we calculate the gradient of the image intensity around this point using the Sobel operator. The Sobel operator provides partial derivatives of intensity in the x and y directions:

\[ \nabla_x = \frac{\partial I}{\partial x}, \quad \nabla_y = \frac{\partial I}{\partial y} \]

where \( \nabla_x \) and \( \nabla_y \) represent the rate of change of intensity along the x and y axes, respectively.

2. Determining the Orientation

Using the partial derivatives \( \nabla_x \) and \( \nabla_y \), we compute the orientation \( \theta \) of the gradient at \( (x, y) \). The orientation is given by:

\[ \theta = \arctan\left(\frac{\nabla_y}{\nabla_x}\right) \]

where \( \theta \) represents the angle of the gradient with respect to the x-axis.

3. Constructing the Rotation Matrix

To align the patch according to the orientation \( \theta \), we construct a 2D rotation matrix \( R(\theta) \). For a rotation by an angle \( \theta \), the rotation matrix is:

\[ R(\theta) = \begin{bmatrix} \cos \theta & -\sin \theta \\ \sin \theta & \cos \theta \end{bmatrix} \]

This matrix rotates a point \( (x', y') \) around the origin by angle \( \theta \). In our case, we want to rotate the 40x40 patch around its center.

4. Applying the Rotation to the Patch

To rotate the patch, we first center it on the interest point. Then, we use an affine transformation with the rotation matrix \( R(\theta) \) to adjust the patch's orientation. The rotated patch is sampled as follows:

\[ \begin{bmatrix} x' \\ y' \end{bmatrix} = R(\theta) \begin{bmatrix} x - x_c \\ y - y_c \end{bmatrix} + \begin{bmatrix} x_c \\ y_c \end{bmatrix} \]

where \( (x_c, y_c) \) is the center of the patch, and \( (x', y') \) are the new coordinates after rotation.

5. Downsampling the Rotated Patch

After rotating the patch, we downsample it to 8x8 by taking every 5th pixel. This reduces the size and maintains the key structural information of the patch in its new orientation, ensuring it remains rotation-invariant.

Creating the Mosaic of the view of the University Church Purdue using Rotation Invariant Features

First (Warped) Image of University Church Purdue on final canvas using Rotation Invariant Features

Second Image of University Church Purdue on final canvas using Rotation Invariant Features

Third (Warped) Image of University Church Purdue on final canvas using Rotation Invariant Features

When compared to the results of the automatic mosaic without rotation invariant features, these results are marginally better. However, the manual point selection still seems to do better. The presence of the seam in all these three cases could potentially mean that there was a slight shift in the centre of projection of the camera between image 2 and image 3 of the church. But, rotation invariant feature descriptors do bring in marginal improvement in image quality.

Image Warping and Mosaicing

Vishnu Suresh

CS 180: Computer Vision and Computational Photography