~~- (Direct comments to [@obphious](https://twitter.com/obphious))~~+ Since launching Metacademy, I've had a number of people ask , > What should I do if I want to get 'better' at machine learning, but I don't know what I want to learn? Excellent question! My answer: **consistently work your way through textbooks**. I then watch as they grimace in the same way an out-of-shape person grimaces when a healthy friend responds with, "Oh, I watch what I eat and consistently exercise." Progress requires consistent discipline, motivation, and an ability to work through challenges on your own. _But you already know this._ <img width="200px" class="center-image" src="http://i.imgur.com/VzwaFve.jpg" alt="My worn out math stats book."> **But why textbooks**? Because they're one of the few learning mediums where you'll really own the knowledge. You can take a course, a MOOC, join a reading group, whatever you want. But with textbooks, it's an intimate bond. You'll spill brain-juice on every page; you'll inadvertently memorize the chapter titles, the examples, and the exercises; you'll scribble in the margins and dog-ear commonly referenced areas and look for applications of the topics you learn -- the textbook itself becomes a part of your knowledge (the above image shows my nearest textbook). _Successful learners don't just read textbooks_. Learn to use textbooks in this way, and you can master many subjects -- certainly machine learning. In this brief roadmap, I list a few excellent textbooks for advancing your machine learning knowledge and capabilities. __I picked these texts after consulting with fellow graduates students, postdocs, and professors at UC Berkeley -- my own experience played a role as well. This list is purposefully sparse. Having 20 textbooks thrown at you is useless.__ Also, if you want alternative learning resources, [Metacademy](http://www.metacademy.org) is at your disposal as are [all of these textbooks](http://www.reddit.com/r/MachineLearning/comments/1jeawf/machine_learning_books/). # Level 0: Neophyte <div class="block-center-text">~~- <a href="http://www.amazon.com/gp/product/111866146X?ie=UTF8&camp=1789&creativeASIN=111866146X&linkCode=xm2&tag=metacademy-20">~~+ <a href="http://amzn.to/1mZs2DU"> <img width="200px" class="center-image" src="http://i.imgur.com/HJyPwoo.jpg" alt="Data Smart Textbook Image"> Data Smart: Using Data Science to Transform Information into Insight </a> </div>~~- My sister, an artist and writer by trade, asked me how she could understand the basics of data science in a nontrivial way. After reading several introductory and pop books in this area, I recommended Data Smart. My sister was able to work through it, and in fact, the next time I saw her we had a delightful conversation about [[logistic regression]] =).~~+ My sister, an artist and writer by trade, asked me how she could understand the basics of data science in a nontrivial way. After reading several introductory and pop books in this area, I recommended [Data Smart](http://amzn.to/1mZs2DU). My sister was able to work through it, and in fact, the next time I saw her we had a delightful conversation about [[logistic regression]] =). **Expectations**: You'll understand some common machine learning algorithms at a high-level, and you'll be able to implement some simple algorithms in Excel (and a bit in R if you get through the entire book). **Necessary Background**: basic Excel familiarity -- this book is a great starting point if you donâ€™t have a CS/math-based background. Plus, it's not nearly as dry as a typical textbook. **Key Chapters**: It's a short read, and every chapter is fairly illuminating -- though, you can skip the worksheet examples, and chapters 8 and 10 if you're interested in a basic overview. **Capstone Project**: Using [this dataset](https://archive.ics.uci.edu/ml/datasets/Auto+MPG) see if you can predict the MPG of the car given all of its other attributes. This will test your ability to manipulate data for a desired machine learning task, and also your ability to apply the correct machine learning technique to a somewhat vague problem. # Level 1: Apprentice <div class="block-center-text"> <a href="http://amzn.to/1kIbPTL"> <img width="200px" class="center-image" src="http://i.imgur.com/6TpOD4N.jpg" alt="Machine Learning with R Textbook Image"> Machine Learning with R </a> </div> This is an example-laden book for simultaneously learning practical machine learning techniques and the R programming language. I'm a long time Scipy user, but after finishing the first few chapters (and remembering that R packages are so damn simple), I've mostly been turning to R for quick analyses. **Expectations**: You'll be able to recognize when fundamental machine learning algorithms apply to certain problems and implement functioning machine learning code in R **Necessary Background**: No real prerequisites, though the following will help (these can be learned/reviewed as you go): * some programming experience [in R] * some algebra * basic calculus * a little bit of probability theory **Key Chapters**: It's a short book, and I recommend all of the chapters -- be sure to actually think through the examples (and type them into R). If you're looking to shave off some time, you can safely skip chapters 8 and 12. **Capstone Project**: Using [this dataset](http://snap.stanford.edu/data/web-FineFoods.html) see if you can predict the food ratings given all of the other attributes. Use three different machine learning techniques for this task, and justify your top choice. Also, build a classifier that predicts whether a review is "good" or "bad" -- you should use reasonable "good/bad" thresholds. This will test your data munging capabilities, your strategy for analyzing a larger dataset, your knowledge of machine learning techniques, and your ability to write analysis code in R. # Level 2: Journeyman <div class="block-center-text"> <a href="http://amzn.to/UjGhfq"> <img width="200px" class="center-image" src="http://i.imgur.com/yppYN4K.jpg" alt="PRML"> Pattern Recognition and Machine Learning </a> </div> This stage separates those with a surface-level understanding from those with rigorous, in-depth, knowledge. It starts getting mathy at this stage, but if you plan on making machine learning a substantial part of your career, you'll have to cross this bridge. PRML is the classic bridge. Use it. Read it. Love it. But keep in mind that a Bayesian perspective isn't the only story (Bishop strongly tends towards the Bayesian approach to machine learning). ** Expectations ** Be able to recognize, implement, debug, and interpret the output of most off-the-shelf machine learning methods. Also, you should have an intuition about which advanced ML concepts to investigate for a given problem. Practicing data scientists should at least be at this level. **Necessary Background**: * you should be comfortable with off-the-shelf clustering and classification algorithms * linear algebra: understand matrix algebra and determinants * some multivariate and vector calculus experience -- know what a Jacobian is * some machine learning implementation experience in R, Matlab, the SciPy stack, or Julia. **Key Chapters**: Know and love chapters 1-12.1. Chapters 12.2 - 14 can be consulted as you need them. **Capstone Project**: Implement the [Online Variational Bayes Algorithm for Latent Dirichlet Allocation](https://www.cs.princeton.edu/~blei/papers/HoffmanBleiBach2010b.pdf) and analyze a large corpus of your choosing. Verify that your LDA implementation is correct. This will test your ability to understand and interpret cutting-edge machine learning algorithms, approximate and online inference techniques, as well as your implementation chops, your data munging abilities, and your ability to define an interesting application from a vaguely defined problem. **Note** PRML spends quite a bit of time on Bayesian machine learning methods. If you're unfamiliar with Bayesian statistics, I recommend studying the first 5 chapters of [Doing Bayesian Data Analysis](http://amzn.to/1nqV6Kf) # Level 3: Master <div class="block-center-text"> <a href="http://amzn.to/1nWMyK7"> <img width="200px" class="center-image" src="http://i.imgur.com/6eD76vT.jpg" alt="PGM"> Probabilistic Graphical Models: Principles and Techniques</a> </div> There's a number of subjects you may want to study in depth at the master level: convex optimization, [measure-theoretic] probability theory, discrete optimization, linear algebra, differential geometry, or maybe computational neurology. But if you're at this level, you probably have a good sense of what areas you'd like to improve, so I'll stick with the single book recommendation. <a href="http://amzn.to/1nWMyK7"> Probabilistic Graphical Models: Principles and Techniques </a> is a classic, monstrous tomb that should be within arms length of any ML researcher worth his/her salt =). PGMs pervade machine learning, and with a strong understanding of this content, you'll be able to dive into most machine learning specialties without too much pain. **Expectations**: You'll be able to construct probabilistic models for novel problems, determine a reasonable inference technique, and evaluate your methodology. You'll also have a much deeper understanding of how various models relate, e.g. how [[deep belief networks]] can be [viewed as factor graphs](https://plus.google.com/+YannLeCunPhD/posts/51gWtf7X3Ee). **Necessary Background**: * you should be comfortable with most off-the-shelf ML algorithms * linear algebra -- know how to interpret eigenvalues * multivariate and vector calculus experience * some machine learning implementation experience in R, Matlab, the SciPy stack, or Julia. **Key Chapters**: Chapters 1-8 cover similar content as Bishop's Pattern Recognition and Machine learning Ch. 2 and 8, but at a much deeper level. Chapters 9-13 contain key content, and Ch. 19 on partially observed data is really helpful. Read Ch. 14 and Ch. 15 when/if they are relevant to your goals. **Capstone Project**: At this point, you should be able to define and pursue your own machine learning projects. Perhaps plunge into the world of ["big data"](http://snap.stanford.edu/data/com-Friendster.html). # Level 4: Grandmaster If you've achieved master status, you'll have a strong enough ML background to pursue any ML-related specialization at a novel level: e.g. maybe you're interested in pursuing novel [deep learning](http://metacademy.org/roadmaps/rgrosse/deep_learning) applications or characterizations? Maybe you should become a Metacademy contributor?