Is it OK to try to find the best PCA k parameter as we do with other hyperparameters?

Question

Principal Component Analysis (PCA) is used to reduce n-dimensional data to k-dimensional data to speed things up in machine learning. After PCA is applied, one can check how much of the variance of the original dataset remains in the resulting dataset. A common goal is keeping variance between 90% and 99%.

My question is: is it considered a good practice to try different values of the k parameter (size of the resulting dataset's dimension) and then check the results of the resulting models against some cross-validation dataset in the same way as we do to pick good values of other hyperparameters like regularization lambdas and thresholds?

Esmailian · Answer 1 · 2019-03-30T22:06:31.827

Your emphasis on using a validation set rather than the training set for selecting $k$ is a good practice and should be followed. However, we can do even better!

The parameter $k$ in $\text{PCA}$ is more special than a general hyper-parameter. Because, the solution to $\text{PCA}(k)$ already exists in $\text{PCA}(K)$, for $K > k$, which is the first $k$ Eigenvectors (corresponding to $k$ largest Eigenvalues) in $\text{PCA}(K)$. Therefore, instead of running $\text{PCA}(1)$, $\text{PCA}(4)$, ..., $\text{PCA}(K)$ separately on training data, as we do for a hyper-parameter in general, we only need to run $\text{PCA}(K)$ to have the solution for all $k \in \{1,..,K\}$.

As a result, the process would be as follows:

Run $\text{PCA}$ for the largest acceptable $K$ on training set,
Plot, or prepare ($k$, variance) on validation set,
Select the $k$ that gives the minimum acceptable variance, e.g. 90% or 99%.

And, N-fold cross validation would be as follows:

Run $\text{PCA}$ for the largest acceptable $K$ on N training folds,
Plot, or prepare ($k$, average of N variances) on held-out folds,
Select the $k$ that gives the minimum acceptable average variance, e.g. 90% or 99%.

Also, here is a related post that asks "why do we choose principal components based on maximum variance explained?".

Is K-PCA the correct name for this? It sounds a bit confusing and remembers me of Kernel Principal Component Analysis (KPCA), which is a non-linear version of PCA — Pedro Henrique Monforte, Mar 28 '19 at 02:36

Is it OK to try to find the best PCA k parameter as we do with other hyperparameters?

1 Answers1