1

Principal components analysis need standardization or normalization? After some google, I get confused. pca need the scalar be same. So which should I use.

Which technique needs to do before PCA?

Does pca need standardization? standardized values will always be zero, and the standard deviation will always be one.

Does pca need normalization? range zero to one

or both ?

andy
  • 35
  • 1
  • 4

2 Answers2

1

I believe Normalization refers to scaling the variable in between 0 and 1. Standardization refers to making the empirical distribution $Y\sim N(0,1)$. Principal component analysis, and similar methods such as Ridge Regression and Partial Least Squares regression, require standardization before training, i.e. $y_{i}=\frac{y_i-\mu_y}{\sigma_{y}}$, reference: Elements of Statistical Learning, Ch. 3.4

0

Purpose of PCA is to find directions that maximizes the variance. If variance of one variable is higher than others we make the pca components biased in that direction.

So, best thing to do is make the variance of all variables the same. One way of doing this is by standardizing all the variables.

Normalization does not make all variables to have the same variance.

kangaroo_cliff
  • 362
  • 1
  • 9