When to use what - Machine Learning

Question

Recently in a Machine Learning class from professor Oriol Pujol at UPC/Barcelona he described the most common algorithms, principles and concepts to use for a wide range of machine learning related task. Here I share them with you and ask you:

is there any comprehensive framework matching tasks with approaches or methods related to different types of machine learning related problems?

How do I learn a simple Gaussian? Probability, random variables, distributions; estimation, convergence and asymptotics, confidence interval.

How do I learn a mixture of Gaussians (MoG)? Likelihood, Expectation-Maximization (EM); generalization, model selection, cross-validation; k-means, hidden markov models (HMM)

How do I learn any density? Parametric vs. non-Parametric estimation, Sobolev and other functional spaces; l ́ 2 error; Kernel density estimation (KDE), optimal kernel, KDE theory

How do I predict a continuous variable (regression)? Linear regression, regularization, ridge regression, and LASSO; local linear regression; conditional density estimation.

How do I predict a discrete variable (classification)? Bayes classifier, naive Bayes, generative vs. discriminative; perceptron, weight decay, linear support vector machine; nearest neighbor classifier and theory

Which loss function should I use? Maximum likelihood estimation theory; l -2 estimation; Bayessian estimation; minimax and decision theory, Bayesianism vs frequentism

Which model should I use? AIC and BIC; Vapnik-Chervonenskis theory; cross-validation theory; bootstrapping; Probably Approximately Correct (PAC) theory; Hoeffding-derived bounds

How can I learn fancier (combined) models? Ensemble learning theory; boosting; bagging; stacking

How can I learn fancier (nonlinear) models? Generalized linear models, logistic regression; Kolmogorov theorem, generalized additive models; kernelization, reproducing kernel Hilbert spaces, non-linear SVM, Gaussian process regression

How can I learn fancier (compositional) models? Recursive models, decision trees, hierarchical clustering; neural networks, back propagation, deep belief networks; graphical models, mixtures of HMMs, conditional random fields, max-margin Markov networks; log-linear models; grammars

How do I reduce or relate features? Feature selection vs dimensionality reduction, wrapper methods for feature selection; causality vs correlation, partial correlation, Bayes net structure learning

How do I create new features? principal component analysis (PCA), independent component analysis (ICA), multidimensional scaling, manifold learning, supervised dimensionality reduction, metric learning

How do I reduce or relate the data? Clustering, bi-clustering, constrained clustering; association rules and market basket analysis; ranking/ordinal regression; link analysis; relational data

How do I treat time series? ARMA; Kalman filter and stat-space models, particle filter; functional data analysis; change-point detection; cross-validation for time series

How do I treat non-ideal data? covariate shift; class imbalance; missing data, irregularly sampled data, measurement errors; anomaly detection, robustness

How do I optimize the parameters? Unconstrained vs constrained/Convex optimization, derivative-free methods, first- and second-order methods, backfitting; natural gradient; bound optimization and EM

How do I optimize linear functions? computational linear algebra, matrix inversion for regression, singular value decomposition (SVD) for dimensionality reduction

How do I optimize with constraints? Convexity, Lagrange multipliers, Karush-Kuhn-Tucker conditions, interior point methods, SMO algorithm for SVM

How do I evaluate deeply-nested sums? Exact graphical model inference, variational bounds on sums, approximate graphical model inference, expectation propagation

How do I evaluate large sums and searches? Generalized N-body problems (GNP), hierarchical data structures, nearest neighbor search, fast multiple method; Monte Carlo integration, Markov Chain Monte Carlo, Monte Carlo SVD

How do I treat even larger problems? Parallel/distributed EM, parallel/distributed GNP; stochastic subgradient methods, online learning

How do I apply all this in the real world? Overview of the parts of the ML, choosing between the methods to use for each task, prior knowledge and assumptions; exploratory data analysis and information visualization; evaluation and interpretation, using confidence intervals and hypothesis test, ROC curves; where the research problems in ML are

Really broad. I think each sub-question must be a separate question in order to have meaningful answer. — Amir Ali Akbari, Jan 25 '15 at 16:50
This question could be qualified as too broad or not too broad, depending on how you look at it. If the question would imply a detailed description of tasks and methods, that would be surely broad not only for a question, but even for a single book. However, I don't think that this question implies that interpretation. I believe that this question seeks a framework or a taxonomy, matching tasks with approaches or methods (algorithms and concepts should be ignored due to granularity issues). From that perspective, this answer is not too broad and, thus, is IMHO valid. — Aleksandr Blekh, Jan 28 '15 at 13:16
@AleksandrBlekh Exactly a framework of the kind you mention is the intention of the question. I'm editing it to clarify. Thank you — Javierfdr, Jan 28 '15 at 14:22
@SeanOwen I modified the main question. Please tell me if is still broad and I would need to make it sharper. Thx! — Javierfdr, Jan 29 '15 at 16:41
I'm working on solving this via Machine Learning Problem Bible (MLPB), a github repo of minimal example ML problems and solutions with searchable tags like [regression], [classification], [sparse-data], etc. — Ben, Jul 10 '16 at 01:16
I would love to see this question asked the other way around, with a list of common problems ML can solve, and which methods best solve those problems! — stevec, Feb 03 '19 at 05:42

score 6 · Answer 1 · edited Apr 13 '17 at 12:44

6

I agree with @geogaffer. This is a very good list, indeed. However, I see some issues with this list as it is currently formulated. For example, one issue is that suggested solutions are of different granularity levels - some of them represent approaches, some - methods, some - algorithms, and some other - just concepts (in other words, terms within a topic's domain terminology). In addition, - and I believe that this is much more important than the above - I think that it would be much valuable, if all those solutions in the list were arranged within a unified thematic statistical framework. This idea was inspired by reading an excellent book by Lisa Harlow "The essence of multivariate thinking". Hence, recently I've initiated a corresponding, albeit currently somewhat limited, discussion on the StackExchange's Cross Validated site. Don't let the title confuse you - my implied intention and hope is for building a unified framework, as mentioned above.

edited Apr 13 '17 at 12:44

Community

1

answered Jan 21 '15 at 07:50

Aleksandr Blekh

6,518
4
28
54

That framework you mentioned would be a great thing to have! Is there anything similar being written? – Javierfdr Jan 21 '15 at 15:49
@Javierfdr: Nothing that I'm aware of. However, I keep looking. – Aleksandr Blekh Jan 21 '15 at 16:05
@AleksandrBlekh the more I think about it the more I think the search for a statistical framework is misguided. See Frank Harrell's answer on your question, and my answer to this one. But Harlow's book sounds really interesting and I'm gonna pick it up from the library this week. – shadowtalker Jan 21 '15 at 16:12
1

@ssdecontrol: I respectfully disagree. Assuming that such framework doesn't exist (which is most likely the case at the present time) and realizing that it's not an easy task to create one, I strongly believe that it's very much possible, nevertheless. As for the answers you've mentioned (I always read all of them), I read both, but they don't prove that creating such framework is impossible - just difficult, as I've mentioned. That's not something that should stop people from thinking about it and even working toward that. Enjoy Harlow's book. – Aleksandr Blekh Jan 21 '15 at 16:29

geogaffer · Answer 2 · 2015-01-21T21:52:52.590

That's a good list covering a lot. I've used some of these methods since before anything was called machine learning, and I think you will see some of the methods you list coming in and out of use over time. If a method has been out of favour for too long, it might be time for a revisit. Some methods can obfuscate behind different names resulting from different fields of study.

One of the main areas I have used these methods is in mineral potential modelling, which is geospatial and to support that you could add some additional categories relating to spatial and oriented data methods.

Taking your broad question to specific fields will probably be where you find more examples of methods not in your comprehensive list. For example, two methods I've seen in mineral potential have been backward stepwise regression and weights of evidence modelling. I'm not a statistician; perhaps these would be considered covered in the list under linear regression and Bayesian methods.

score 1 · Answer 3 · answered Jan 21 '15 at 14:17

I think your approach is a little backwards.

"What is the mean of a Gaussian distribution fitted to this data?" is never the problem statement, so "how do I fit a Gaussian?" is never the problem you actually want to solve.

The difference is more than semantic. Consider the question of "how do I construct new features?" If your goal is to develop an index, you might use some type of factor analysis. If your goal is to simply reduce the feature space before fitting a linear model, you might skip the step entirely and use elastic net regression instead.

A better approach would be to compile a list of actual data analysis tasks you would like to be able to tackle. Questions like:

How do I predict whether customers will return to my shopping website?

How do I learn how many "major" consumer shopping patterns there are, and what are they?

How do I construct an index of "volatility" for different items in my online store?

Also your list right now includes an enormous amount of material; far too much to "review" and gain more than a surface-level understanding. Having an actual purpose in mind can help you sort out your priorities.

I understand what you say @ssdecontrol, actually having a comprehensive list of solution to typical problems as you mention could also be very useful. Now, the main difference between the two approaches is that what I'm proposing is directly linked to the technical questions you might ask yourself when you are already trying alternatives,and in that point you already made some assumptions.So, if you have assumed that your features are not-gaussian, should I use PCA for dimensionality reduction? No. Your approach is wider: What to use for dim. reduction -> PCA, but assume gaussian features. Thx — Javierfdr, Jan 21 '15 at 15:46
@Javierfdr my point is that the technical questions are a distraction if you don't have a substantive question in mind. — shadowtalker, Jan 21 '15 at 16:08

When to use what - Machine Learning

3 Answers3

Linked