Why Principal Component Analysis (PCA) works?

Do not miss this exclusive book on Binary Tree Problems. Get it now for free.

Reading time: 15 minutes

Principal component analysis (PCA) is a technique to bring out strong patterns in a dataset by supressing variations. It is used to clean data sets to make it easy to explore and analyse. The algorithm of Principal Component Analysis is based on a few mathematical ideas namely:

Variance and Convariance
Eigen Vectors and Eigen values

You need to understand the philoshophical aspects of the associated mathematical operations to understand why Principal Component Analysis works as it is.

While PCA is a very technical method relying on in-depth linear algebra algorithms, it’s a relatively intuitive method when you think about it.

Intuition behind Covariance matrix

If you remember, we calculate the covariance matrix ZᵀZ for the data set.

Covariance Matrix is a matrix that contains estimates of how every variable in Z relates to every other variable in Z. Understanding how one variable is associated with another is quite powerful.

Intuition behind Eigenvectors

We have calculated the eigenvalues and eigenvectors of the covariance matrix.

Eigenvectors represent directions. Think of plotting your data on a multidimensional scatterplot. Then one can think of an individual eigenvector as a particular “direction” in your scatterplot of data.

Eigenvalues represent magnitude, or importance. Bigger eigenvalues correlate with more important directions.

Finally, we make an assumption that more variability in a particular direction correlates with explaining the behavior of the dependent variable. Lots of variability usually indicates signal, whereas little variability usually indicates noise. Thus, the more variability there is in a particular direction is, theoretically, indicative of something important we want to detect.

Thus, PCA is a method that brings together the following key ideas:

A measure of how each variable is associated with one another. (Covariance matrix.)
The directions in which our data are dispersed. (Eigenvectors.)
The relative importance of these different directions. (Eigenvalues.)
PCA combines our predictors and allows us to drop the eigenvectors that are relatively unimportant.