Questions tagged [pca]

Principal component analysis, a technique for dimensionality reduction.

Principal component analysis (PCA) is a statistical technique for dimension reduction often used in clustering or factor analysis. Given any number of explanatory or causal variables, PCA ranks the variables by their ability to explain greatest variation in the data. It is this property that allows PCA to be used for dimension reduction, i.e. to identify the most important variables from amongst a large set possible influences.

Mathematically, principal component analysis (PCA) amounts to an orthogonal transformation of possibly correlated variables (vectors) into uncorrelated variables called principal component vectors.

336 questions
13
votes
2 answers

How many dimensions to reduce to when doing PCA?

How to choose K for PCA? K is the number of dimensions to project down to. The only requirement is to not lose too much information. I understand it depends on the data, but I'm looking more for a simple general overview about what characteristics…
pr338
  • 385
  • 2
  • 7
5
votes
1 answer

Intuition behind PCA eigenvectors

For undergraduate students who understand the definition of eigenvectors and eigenvalues, $$A v = \lambda v \;,$$ what is the intuition behind why the eigenvectors of the covariance (or correlation) matrix correspond to the axes of maximal…
4
votes
1 answer

What are the model parameters in PCA?

I've been asked to report the number of parameters to be learned in a PCA model. This answer implies that parameters do exist in PCA, but does not explain. Software packages often report the number of parameters, but do not document what those…
Chris Keefe
  • 143
  • 3
2
votes
3 answers

Interpreting the new dimensions after PCA

I have telecom data with large number of dimentions. Now if I apply dimentionality reduction like PCA then from resulting dimention say PC1, PC2 I would loose the meaning or would not understand what they represent. Are their any techniques other…
2
votes
1 answer

Difference between Factorization Machines and PCA?

Factorization Machines (FMs) are a means to express the high dimensional data into lower dimensions, despite the original data being sparse. How is it different from PCA which itself is a dimensionality reduction technique? Are there pros-cons of…
sandyp
  • 224
  • 2
  • 8
2
votes
1 answer

Scale of the data after PCA

I have 4 standard normal features on which I perform PCA. I then take the first principal component (with all of the components). Is it possible to a priori say what is the max and the min value that the transformed series will have? I guess, if we…
Naz
  • 163
  • 6
2
votes
1 answer

PCA: projection of positive data on negative side of plane

I did PCA on my data and projected the data on first two eigen vectors. After projection I see that the scatter plot of the data starts from [-1,-1]. My data is all positive. Is it correct for the data to be negative in the projected space.
shaifali Gupta
  • 420
  • 4
  • 17
2
votes
1 answer

PCA first dimension do not not capture enough variance

I am doing a PCA as a data exploration step and I realize that the two first principal components capture only 25% of the variance, the ten first principle component capture about 60% of the information, does it worth to interpret those axis knowing…
2
votes
1 answer

Can PCA be applied to reduce dimensionality of only a subset of features?

Lets say I have a feature set of f0 to f1000. I am thinking of applying PCA on f500 to f1000 reducing their dimensionality. Can I combine this reduced set with the features f0 to f499 as the feature space for training a learning algorithm?
2
votes
2 answers

How do you know PCA would work on your dataset?

From my understanding, PCA assumes that redundancy in features can be explained by linear relationships. It also finds orthogonal bases, so when the variance of your data is maximized along non-orthogonal directions PCA isn't going to give you what…
1
vote
0 answers

Do i need sort the data by the eigenvalues with descend (PCA)

after computing the eigenvectors and eigenvalues. the eigenvectors sort by eigenvalues on descending. data * eigenvectors = transformed data how about the standardized raw data, do I need to sort them by the eigenvalues too? before calculating the…
andy
  • 35
  • 1
  • 4
1
vote
2 answers

Principal components analysis need standardization or normalization?

Principal components analysis need standardization or normalization? After some google, I get confused. pca need the scalar be same. So which should I use. Which technique needs to do before PCA? Does pca need standardization? standardized values…
andy
  • 35
  • 1
  • 4
1
vote
1 answer

Is it possible to apply PCA on different subsets independently?

I need to apply PCA on a rather big set of data, but my machine is not able to handle the workload. So I was considering to split randomly my original set into 4 subsets, apply PCA independently on each subset and finally join the 4 subsets to have…
Ignacio Alorre
  • 165
  • 1
  • 7
1
vote
2 answers

Original Features Identification After PCA Analysis

I have performed a PCA analysis over my original dataset and from the compressed dataset transformed by the PCA I have also selected the number of PC I want to keep. Now I am struggling with the identification of the original features that are…
DataP
  • 31
  • 2
1
vote
2 answers

How to decide the number of primary components for PCA

Background Trying to identify the number of primary components to use (k) for PCA for MNIST aiming at 95%. from sklearn.datasets import fetch_openml mnist = fetch_openml('mnist_784', version=1) # Split data into training and test X, y =…
mon
  • 711
  • 2
  • 10
  • 19
1
2