×

Search anything:

# Why Principal Component Analysis (PCA) works?

#### Machine Learning (ML) principal component analysis Get this book -> Problems on Array: For Interviews and Competitive Programming

Principal component analysis (PCA) is a technique to bring out strong patterns in a dataset by supressing variations. It is used to clean data sets to make it easy to explore and analyse. The algorithm of Principal Component Analysis is based on a few mathematical ideas namely:

• Variance and Convariance
• Eigen Vectors and Eigen values

You need to understand the philoshophical aspects of the associated mathematical operations to understand why Principal Component Analysis works as it is.

While PCA is a very technical method relying on in-depth linear algebra algorithms, it’s a relatively intuitive method when you think about it.

### Intuition behind Covariance matrix

If you remember, we calculate the covariance matrix ZᵀZ for the data set.

Covariance Matrix is a matrix that contains estimates of how every variable in Z relates to every other variable in Z. Understanding how one variable is associated with another is quite powerful.

### Intuition behind Eigenvectors

We have calculated the eigenvalues and eigenvectors of the covariance matrix.

Eigenvectors represent directions. Think of plotting your data on a multidimensional scatterplot. Then one can think of an individual eigenvector as a particular “direction” in your scatterplot of data.

Eigenvalues represent magnitude, or importance. Bigger eigenvalues correlate with more important directions.

Finally, we make an assumption that more variability in a particular direction correlates with explaining the behavior of the dependent variable. Lots of variability usually indicates signal, whereas little variability usually indicates noise. Thus, the more variability there is in a particular direction is, theoretically, indicative of something important we want to detect.

Thus, PCA is a method that brings together the following key ideas:

• A measure of how each variable is associated with one another. (Covariance matrix.)
• The directions in which our data are dispersed. (Eigenvectors.)
• The relative importance of these different directions. (Eigenvalues.)
• PCA combines our predictors and allows us to drop the eigenvectors that are relatively unimportant. #### OpenGenus Foundation

The official account of OpenGenus IQ backed by GitHub, DigitalOcean and Discourse

Improved & Reviewed by:

Why Principal Component Analysis (PCA) works?