25

I've recently started learning to work with sklearn and have just come across this peculiar result.

I used the digits dataset available in sklearn to try different models and estimation methods.

When I tested a Support Vector Machine model on the data, I found out there are two different classes in sklearn for SVM classification: SVC and LinearSVC, where the former uses one-against-one approach and the other uses one-against-rest approach.

I didn't know what effect that could have on the results, so I tried both. I did a Monte Carlo-style estimation where I ran both models 500 times, each time splitting the sample randomly into 60% training and 40% test and calculating the error of the prediction on the test set.

The regular SVC estimator produced the following histogram of errors: SVC Error Rate While the linear SVC estimator produced the following histogram: Linear SVC Error Rate

What could account for such a stark difference? Why does the linear model have such higher accuracy most of the time?

And, relatedly, what could be causing the stark polarization in the results? Either an accuracy close to 1 or an accuracy close to 0, nothing in between.

For comparison, a decision tree classification produced a much more normally distributed error rate with an accuracy of around .85.

metjush
  • 536
  • 1
  • 5
  • 7
  • I assume the scikit-learn documentation does not highlight the difference? Did you check? – Rohit Sep 02 '15 at 15:06
  • What kernel did you use in SVC? default settings = "rbf"?
  • One-against-one and one-against-all are different approaches
  • – kpb Sep 02 '15 at 15:10
  • the documentation is kinda sparse/vague on the topic. It mentions the difference between one-against-one and one-against-rest, and that the linear SVS is Similar to SVC with parameter kernel=’linear’, but implemented in terms of liblinear rather than libsvm, so it has more flexibility in the choice of penalties and loss functions and should scale better (to large numbers of samples). – metjush Sep 02 '15 at 15:11
  • for regular SVC, I used the default kernel.

    I know 1v1 and 1vR are different approaches, but I guess that's what I want to know - why do they produce such different results? Is it the kernel choice or the different approach to multiple category classification?

    – metjush Sep 02 '15 at 15:12
  • Titles for X and Y axis would help :-) – dzieciou Oct 08 '20 at 11:28
  • check https://scikit-learn.org/stable/modules/svm.html#svm-mathematical-formulation – Ferroao May 09 '21 at 20:16