SVD is operation of decomposing a matrix $M$ into the matrix product $M=U\Lambda V$, where $U$ and $V$ are unitary matrices ($U^TU=UU^T=I$, etc.), and $\Lambda$ is diagonal.
So the formal answer is:
- No, there are no assumptions about distribution of $M$. It does not to be random at all, it is just a matrix of any nature.
- Condition $n>m$, where $(n, m)$ is shape of $M$, is not necessary. Indeed, if you have $n>m$, you can transpose the equation $M=U\Lambda V$ to get $M^T=V^T\Lambda U^T$, and it will be also a valid SVD decomposition.
Now the less formal part.
In data mining SVD has two main applications: for computations (like matrix inversion, least squares, etc.), and for dimensionality reduction (e.g. to compress the user-item matrix in recommender systems). For computations, only algebraic properties of SVD (shown above) matter.
For dimensionality reduction, truncated SVD is used: the small elements of $\Lambda$ are discarded, and only $k$ largest are kept. This operation is equivalent to finding a $k$-dimensional hyperplane such that projection of the data ($M$) on this hyperplane is closest (in Euclidian distance) to the original data.
In the last case, the analyst would like the decomposition to generalize well to unseen data, and this problem can be formulated well in probabilistic language. If we state that $M$ consist of $n$ IID $m$-dimensional random variables, then it appears that SVD works best if they are joint normal. That's because:
a. multivariate normal is indeed distributed nearly a hyperplane, and
b. for multivariate normal, Euclidean distance is tightly connected with probability density. Therefore truncated SVD (or PCA, which is mathematically identical) can be viewed as maximum likelihood estiomation of multivariate normal distribution with $k$ independent components. For more details, see the article by Bishop.
1 ≤ features ≤ m
values. 2: You can transpose your matrix to satisfy this condition (I don't see what would go wrong ifn=m
). – Swier Apr 21 '16 at 09:15