32

At first glance, this is once again a reference request for "How to start machine learning".

However, my mathematical background is relatively strong and I am looking for an introduction to machine learning using mathematics and actually proving things.

Most references are relatively imprecise and use tons of bla bla where simple formulae and only one example would provide the same content. Also proofs are only found in rare instances.

Starting from standard hand-waving literature (e.g. first Amazon results), I discovered Andrew Ng's Coursera course, then Bishop's book on pattern recognition and finally Smola's book on Machine Learning. The latter seems to be the first that suits my expectations. Unfortunately, the book is only in draft state.

Are there other references that provide a similar level of rigor as Smola's book? Potentially with different or additional content?

Maybe I should add a little bit more about my background:

I have a (German) PhD in mathematics (in the field of PDEs). Particularly, I am used to applied analysis, optimal control theory, calculus of variations, some measure and probability theory, numerics and differential geometry. During my diploma, my minor subject was computer science. Hence, somehwere inside my head, I still have some knowledge on algorithms, computational geometry and geometric modelling.

Edit: Would it potentially be better to ask this question at Data Science Stack Exchange? I don't want to spam the board with the same question, but if you think that I have a higher chance to obtain an answer there, I would post the question there. Of course, I would link those questions and answers. Any comment on that?

Quickbeam2k1
  • 2,139
  • Or perhaps http://stats.stackexchange.com/ - I've read a few questions there regarding machine learning techniques. I'm currently learning from Ng's course and Bishop's book - so will be interesting to see if you find something nice! – Mike Miller Mar 25 '15 at 16:13
  • @ Mike Miller, actually I'm currently doing the same :) I'll also ask there. Maybe we are lucky. – Quickbeam2k1 Mar 25 '15 at 16:53
  • 5
    In the, now deleted crosspost, the book of Schölkopf and Smola was suggested. Additionally, the book of Hastie et. al "The Elements of Statistical Leraning" was recommended too. One thing, I find suboptimal in the last book: the term unbiased is not defined precisely (from a mathematicians point of view). However, this (lack of rigor) is common to a lot of statistic texts and courses I found so far. To me it's astonishing that there seems to be no book on machine learning that is a bit mor rigorous. – Quickbeam2k1 Mar 26 '15 at 19:26
  • For completeness: The book of Schölkopf and Smola defines what an unbiased estimator is. Hence, I think I am going to stick to this and bishop's book unless there are additional recommendations. – Quickbeam2k1 Mar 26 '15 at 19:34
  • Many thanks :-) I'll have a read of Smola's text. I'm in a fairly similar position where I've gone from a maths degree to a PhD in pattern recognition and machine learning. I only did one basic stats course so would like a more mathematical text to supplement Ng's course and Bishop's book. – Mike Miller Mar 26 '15 at 19:35
  • just for clearification: you have a degree in math and, currently, you are doing your phd in pattern recognition and machine learning? – Quickbeam2k1 Mar 26 '15 at 20:06
  • Yes, that's right. – Mike Miller Mar 26 '15 at 20:07
  • Probably a good decision :) If I started my phd over again with my phd, I would make a similiar choice. Nevertheless, I've got my phd and now a new chapter starts :) – Quickbeam2k1 Mar 26 '15 at 20:41
  • It's funny you say that because I was actually looking to go into fluids or pdes and due to circumstances I had to accept a phd closer to where I was living at the time! I didn't think it would be as interesting as I'm not a huge fan of stats, but some of the neural networks and Fourier analysis is actually very interesting. You'll fly through the material with your background :-) – Mike Miller Mar 26 '15 at 20:47
  • 1
    Maybe you should try reading this book about Neural Networks by MIT Press, seems very mathematical, https://www.amazon.com/Fundamentals-Artificial-Neural-Networks-Press/dp/0262514672/ref=sr_1_2?s=books&ie=UTF8&qid=1489529641&sr=1-2&keywords=MIT+neural+networks – MathNerd Mar 14 '17 at 22:20
  • 1
    Maybe you may be interested in this book too: https://www.amazon.com/Evolutionary-Optimization-Algorithms-Dan-Simon/dp/0470937416/ref=sr_1_1?s=books&ie=UTF8&qid=1489530070&sr=1-1&keywords=evolutionary+optimization+algorithms – MathNerd Mar 14 '17 at 22:22
  • 2
    btw, the stanford enginering everywhere course of andrew ng is a lot more mathematical than its coursera counterpart – Quickbeam2k1 Mar 15 '17 at 06:52
  • 1
    You can also check this mathematical heavy book: https://www.amazon.com/Neural-Networks-Statistical-Learning-Ke-Lin/dp/144715570X/ref=sr_1_2?s=books&ie=UTF8&qid=1490132968&sr=1-2&keywords=Springer+neural+networks+learning – MathNerd Mar 21 '17 at 21:55
  • The crosspost of this question seems to have been undeleted. – user3658307 Mar 20 '19 at 23:59
  • 1
    @Quickbeam2k1 Have you been able to find proper books? Because I have similar situation and don't know where shall I start. – Edwin Apr 24 '20 at 13:06
  • Actually, I'd say that most books are not very detailed from a mathematical perspective. There is learning from data were you can find some proofs in the first chapters. Apart from that, I haven't found the book for me yet. Murphy's book probably gives a good general overview. I'd say – Quickbeam2k1 Apr 27 '20 at 18:47

2 Answers2

12

My opinion is that it depends on which subarea of machine learning interests you. Unfortunately, at this point, much of the relevant literature (especially for theory) exists only in publications, rather than books. But this question is just about where to start, I suppose.

The more popular, "practically oriented" undergraduate targeting books like Hastie, The Elements of Statistical Learning, or Bishop, Pattern Recognition and Machine Learning are essentially non-mathematical. Books that target the probabilistic model point of view, such as the ones by Murphy and Bach (Machine Learning: A Probabilistic Perspective) and Koller et al (Probabilistic Graphical Models: Principles and Techniques) have a bit more mathematical content, mostly in the area of Bayesian modelling and applied probability (e.g., MCMC, variational inference). I think books in these categories are great introductions to ML, but perhaps not its mathematics.

The most popular book as of writing, Goodfellow et al's Deep Learning, is also non-rigorous and generally mathematically light. However, it does cover more advanced subjects at the end and it is such a comprehensive introduction to the subject that I still recommend it as a starting point.

Classical ML theory is (to a decent extent) concerned with the Probably Approximately Correct (PAC) framework. Two lovely books that focus on basic theory of introductory ML and are mathematically oriented are Shalev-Schwartz and Ben-David, Understanding Machine Learning, and Mohri et al, Foundations of Machine Learning. These are probably good starting points for people interested in starting ML theory, in terms of error bounds, sample complexities, etc... with plenty of theorems.

Specialized books in particular ML topics can be mathematically demanding as well. Schölkopf et al's Learning with Kernels and Rasmussen et al's Gaussian Processes for Machine Learning are, in my opinion, examples of these. There's also the book Information Theory, Inference and Learning Algorithms by Mackay, which covers neural networks from an information theoretic and compression point of view, and Graphical Models, Exponential Families, and Variational Inference by Wainwright and Jordan.

One short-coming of the ML literature, as of writing, is the lack of introductory books helping people access the more mathematically demanding advanced literature (e.g., the game and information theory, and optimal transport concepts used to analyze deep generative models; differential geometry and spectral methods in manifold learning and Riemannian optimization for deep learning). Hopefully one day there will be more expository material to help introduce us to these more mathematically intensive areas.


In my answer to this question, I link to other questions on the same topic, incidentally.

user3658307
  • 10,433
3

You may be interested in Kevin P. Murphy's book: http://www.cs.ubc.ca/~murphyk/MLbook/

Good luck :)

orrymr
  • 223
  • I'll definetly check that, though the portion of proofs seems a bit low – Quickbeam2k1 Apr 11 '15 at 19:29
  • @Quickbeam2k1: Hi. I am looking for a rigorous book on ML. Was Murphy's book helpful? – Vivek Bagaria Dec 25 '15 at 09:29
  • Hey, sorry for the late response. Unfortunately, I was not yet able to dive deeper into that book and die not priorize it highly due to some negative reviews I found on amazon. However, having checked the reviews once more, maybe you should check the following one: http://www.amazon.com/review/R32N9EIEOMIPQU/ref=cm_cr_dp_title/187-7122966-0933221?ie=UTF8&ASIN=0262018020&channel=detail-glance&nodeID=283155&store=books Particulalry the books of Barber and Rasmussen and Williams are freely available online – Quickbeam2k1 Jan 01 '16 at 21:08