Eigenfaces: Principal Component Analysis (PCA) on faces

Statistics, Linear Algebra, Eigenvectors, Dimensionality.

By Cristian Gutiérrez

28th of July, 2023

PCA, short for Principal Component Analysis, is a statistical technique that offers the main advantage of reducing the dimensionality of a representation. This is particularly valuable in Machine Learning, as we are aware that dimensionality can quickly become overwhelming.

This technique can achieve the dimensionality reduction through Eigendecomposition which involves breaking down a matrix \(A\) into a series of Eigenvectors and Eigenvalues. By doing so, we can gain new insights into the object's properties and characteristics.

A fascinating use of Principal Component Analysis is Eigenfaces. This involves employing PCA to decrease the dimensions of a collection of facial images. It can be utilized to search for a match in a vast facial dataset, such as the casino blacklist.

We will make us of the Olivetti faces dataset, created at AT&T Laboratories Cambridge between 1992 and 1994. It contains \(400\) black and white pictures of the faces of 40 distinct subjects of size \(64 \times 64\). Thus, we can now define our matrix of Input data.

$$ A_{400\times4096} $$

In order to apply the Eigendecomposition, we have the strong requirement for the matrix to be a real symmetric matrix, and right now it is not. Because of this, we will make use of the Co-Variance matrix, which we will call \(S\). The Co-Variance will indicate us how the features co-relate to each-other.

$$ B = A - \mu_A, $$ $$ S = B^T B\,. $$

Now that we have a real symmetric matrix, we have the certainty that such Eigenvectors and Eigenvalues exist.

$$ \vec{\lambda},\,\vec{V} = \texttt{torch.lineal.eig(}S\texttt{)}\;, $$

Note that, the \(\texttt{PyTorch}\) library already sorted the Eigenvectors by the quantity of information retrieved in the Eigenvalues, and we can do a first analysis of the situation by calculing the Total Variance of the dataset \(T\), and how much of the total variance is represented in each of the Principal Components. The first Principal Component represents a 23.81% of the total information.

$$ T=Tr(\vec{\lambda})=79.11\;,\quad \frac{\vec{\lambda_1}}{T} = 23.81\%\;, $$

Now we will keep the first \(k\) Principal Components and we will perform the PCA reconstruction by building our Projection matrices \(P\).

$$ Z = \vec{V}(:\,,\,:\!k)\;, $$ $$ P = B Z Z^{T}\;. $$

With all of this, we have a framework in which given the Co-variance matrix of the dataset, we can express each picture in terms of the principal eigenvectors and re-construct the image. As you can see in the last case: we can use a 25% of the information without barely loosing any quality.

Code

Available at github.com/ggcr/Eigenfaces-PCA.

References