Your emphasis on using a validation set rather than the training set for selecting $k$ is a good practice and should be followed. However, we can do even better!
The parameter $k$ in $\text{PCA}$ is more special than a general hyper-parameter. Because, the solution to $\text{PCA}(k)$ already exists in $\text{PCA}(K)$, for $K > k$, which is the first $k$ Eigenvectors (corresponding to $k$ largest Eigenvalues) in $\text{PCA}(K)$. Therefore, instead of running $\text{PCA}(1)$, $\text{PCA}(4)$, ..., $\text{PCA}(K)$ separately on training data, as we do for a hyper-parameter in general, we only need to run $\text{PCA}(K)$ to have the solution for all $k \in \{1,..,K\}$.
As a result, the process would be as follows:
- Run $\text{PCA}$ for the largest acceptable $K$ on training set,
- Plot, or prepare ($k$, variance) on validation set,
- Select the $k$ that gives the minimum acceptable variance, e.g. 90% or 99%.
And, N-fold cross validation would be as follows:
- Run $\text{PCA}$ for the largest acceptable $K$ on N training folds,
- Plot, or prepare ($k$, average of N variances) on held-out folds,
- Select the $k$ that gives the minimum acceptable average variance, e.g. 90% or 99%.
Also, here is a related post that asks "why do we choose principal components based on maximum variance explained?".