I am doing a PCA as a data exploration step and I realize that the two first principal components capture only 25% of the variance, the ten first principle component capture about 60% of the information, does it worth to interpret those axis knowing that they do not capture enough information

- 21
- 1
-
1How many dimensions are you stating with? Is your data scaled? – TBSRounder Feb 08 '18 at 19:05
-
Hi, my data are scaled , the original dataset has 36 variables for 1599 individuals – raton roguille Feb 08 '18 at 19:15
-
1Enough for what? – Paul Feb 08 '18 at 19:33
-
2It means that your features are highly uncorrelated. – Green Falcon Feb 08 '18 at 19:33
-
@Paul interpreting the two first axes for instance given the fact that they only explained 27% of the variance – raton roguille Feb 08 '18 at 19:41
1 Answers
You don't state how many original features there are? where there 10, 50, 50 million? The above 10 principle components capture 62.3% of the variance and so we can be sure that there are more that ten original features. However we will assume then that there aren't many more (lets say circa twenty), lets consider what is going on.
With PCA you are hoping to separate the useful information from the noise. By taking only the the first two components in you example you are discarding all other information as noise.
It depends upon the application but I would say that 25% is never enough (I think serious questions would be raised as why you had). So the question becomes then what is going on? Well lets consider the two extreme cases: A set of uncorrelated features and a set of fully correlated features. Consider first three uncorrelated features:
1 0 0
0 1 0
0 0 1
The total variance is 3. The eigenvalues are
1.000
1.000
1.000
Each component therefore represents 1/3 of the variance and they are equal. Now let use consider the other extreme, all features are highly correlated, i.e. a correlation matrix of
1.000 1.000 1.000
1.000 1.000 1.000
1.000 1.000 1.000
The eigenvalues are
3.000
0.000
0.000
Therefore the first principle component contains all the information and these three features could be characterised by a single latent variable.
So why is this relevant? Well in your data set your eigenvalues are all of similar magnitude. This suggests that the features in your original data are not very correlated (like the first example above, look at the off diagonal elements your original correlation matrix they are likely close to zero) and therefore it is expect that many principle components would be required since there is little shared information in the original feature space.
This of course assume that we only have a small number of original features. If you have 50 million features then capturing 62.3% in only ten latent variables is pretty good,

- 131
- 4
-
-
A dimensionality reduction factor of three for two thirds of the variance is not that bad. You might be able to obtain a more compressed representation with kernel (nonlinear) PCA. – Emre Feb 08 '18 at 20:35