3

I'm having troubles trying to understand why SVM works well with high dimensional data, the case when p >> n.

I read the following: SVM is automatically regularized. You don't have to pick a regularization parameter because picking the widest separation margin is a way to automatically regularize.

However I don't understand why this implies that a SVM works well on this type of data. Also I read about that the optimization problem to maximize the margin doesn't depend on the dimensions, so in what depend?

Norhther
  • 97
  • 1
  • 6
  • The curse of dimensionality affects SVMs as well I assure you of this. On the other hand SVMs can use kernel mappings, which can map high-dimensional features to more informative spaces which can be easily separated linearly – Nikos M. Oct 28 '21 at 14:31
  • @NikosM. Indeed, but it works better than most of classifiers. Only having a kernel function does not address the issues. – Norhther Oct 28 '21 at 20:00
  • I doubt SVMs work better unconditionally and under all cases since both experience and no free lunch theorems inform us otherwise – Nikos M. Oct 28 '21 at 22:08
  • 2
    related https://stats.stackexchange.com/questions/484289/does-svm-suffer-from-curse-of-high-dimensionality-if-no-why – Nikos M. Oct 29 '21 at 15:33
  • 2
    also related https://stats.stackexchange.com/questions/64053/svm-has-relatively-low-classification-rate-for-high-dimensional-data-even-though – Nikos M. Oct 29 '21 at 15:39

2 Answers2

1

The ever-present danger with high-dimensional data is overfitting. When there are a lot of features (p) and relatively few examples (n), it is easy for models to find spurious relationships between features and target.

There are two generic solutions to this problem: dimensionality reduction and regularization. Dimensionality reduction reduces the number of features prior to training. Regularization penalizes the model for adding complexity. For example, L1 or L2 regularization are commonly used in linear models to penalize the size of coefficients. This encourages models to "ignore" certain features by reducing their coefficient to zero.

So to your question directly: the reason that SVMs work well with high-dimensional data is that they are automatically regularized, and regularization is a way to prevent overfitting with high-dimensional data.

zachdj
  • 2,684
  • 6
  • 13
  • 3
    By that logic LASSO/RIDGE/ELASTIC NET should also work very well on high-dimensional data since they are automatically regularized. I am not saying they are not, but the OP's question is specifically about SVM. – user2974951 Oct 29 '21 at 05:44
  • Regularization in svm is going to add just a subset of the training points to be used as support vectors (correct me otherwise) However, the dimension of those is going to be always p. So still don't get how this regularization is going to penalize the model in a dimensional sense. – Norhther Oct 29 '21 at 10:39
  • 1
    @Norhther, I think what the answerer wants to say is that maximum margin of separation (a feature of SVM algorithm) can lead to better generalisation, thus bypasses the overfitting trap. This has nothing to do with limiting dimensions to regularize. Also in high dimensions linear separation is often more feasible. However this should not be a sweeping statement as my other comments on top point out – Nikos M. Oct 29 '21 at 16:45
  • @user2974951 I understood OP's question to be "I know that SVMs are automatically regularized, but why does that make them suitable for high-dimensional data?" My answer is that regularization is one of the remedies for high dimensionality. – zachdj Oct 29 '21 at 17:18
  • OP didn't ask in what way are SVMs regularized or why the max-margin criterion leads to regularization. So IMO the question is only superficially about SVMs. It really asks "why do automatically regularized models work well with high-dimensional data?", using SVM as an example – zachdj Oct 29 '21 at 17:18
  • @Norhther I used L1 and L2 regularization as examples of regularization, but I didn't claim that SVMs reduce the dimensions of the model. Penalizing high dimensional models is one form of regularization, but there are many others. As pointed out by Nikos M., the regularization in SVMs comes from the maximum margin criterion – zachdj Oct 29 '21 at 17:23
  • @NikosM. Of course, there always be a high-dimensional space where the data can be linearly separable – Norhther Oct 30 '21 at 02:49
  • @zachdj No, my question is about why the regularization implies that this can work well with high dimensional data, and also why the quadratic optimization problem does not depend on the dimension – Norhther Oct 30 '21 at 02:50
0

Not sure where your statement comes from, but you need to consider the ideas behind the SVM and how it works to answer your question.

Here is the summary:

  1. SVM approach is to actually map data to higher dimension space than the dataset has - to achieve better separability. You can refer to kernel trick article. SVM's advantage is that it works faster, and only samples near the boundary affect the separating hyperplane.

  2. Success of any model depends on proper parametrization, and in case of SVM - proper kernel choice. Here is an example of under- and overfitting of SVM from ISL book. You can read about regularization there too. enter image description here

  3. As noted in comments - there is no free lunch, and same SVM setup can work for some problems and fail for the others. The only way is to perform several experiments and observe quality for yourself on specific datasets you've got.

mikalai
  • 164
  • 6
  • I'm sorry, but your answer is not telling me anything that I didn't know, and its not adressing the question. – Norhther Nov 03 '21 at 23:53
  • @Norhther Probably your question can be better formulated or put into some specific context. – mikalai Nov 04 '21 at 07:45