This is an oldie but goodie from Matthew Mayo at KDNuggets. Mayo, an aspiring machine learning scientist, presented a Quora post titled ‘How Do I Learn Machine Learning?’ as a robust resource for fledgling data scientists. This Quora post had 93 answers and more than 468,000 views and Mayo felt a number of the contributions from well-known personalities in the machine learning world were worth a second look.
Xavier Amatriain
Xavier says he “builds teams and machine learning algorithms to solve hard problems with impact on the world.” Amatriain was formerly Director of Netflix Recommendation Algorithms and VP of Engineering at Quora. He is “now trying to improve healthcare with AI.”
Xavier’s advice from the Quora post on ‘How Do I Learn Machine Learning?’: “Get a good ML book (his list shown in next paragraph), read the first intro chapters, and then jump to whatever chapter includes an algorithm you are interested in. Once you have found that algo, dive into it, understand all the details, and, especially, implement it…here I am talking about implementing an algorithm from scratch in a “real” programming language. You can still start with an easy one such as L2-regularized logistic regression, or k-means, but you should also push yourself to implement more interesting ones such as latent dirichlet allocation or SVMs.”
Xavier’s book suggestions: The Elements of Statistical Learning by Hastie, Tibshirani and Friendman, Pattern Recognition and Machine Learning by Bishop, Bayesian Reasoning and Machine Learning by Barber, Machine Learning: A Probabilistic Perspective by Murphy and All of Statistics: A Concise Course in Statistical Inference by Wasserman.
Raviteja Chirala
Chirala says he is an author, data scientist and avid programmer. From his Quora posts he also really, really likes soccer! He is currently a ‘data geek’ at Centro and previously was a data scientist at Ayasdi, salesforce.com and a Big Data Engineer at Apple.
Chirala’s advice from the Quora post on ‘How Do I learn Machine Learning?’: “Get scikit-learn or respective framework in the programming language you chose. Run algorithms for every chapter in Joel Grus’ Data Science from Scratch: First Principles with Python. Advantage with Scikit is it gives you some sample data too to test. Get a grip on statistics (academic discipline) and probability. Communities in Quora or Kaggle exercises, etc. will help you in getting up to speed. Also, you can get this book, The Elements of Statistical Learning by Hastie, Tibshirani and Friendman (also recommended by Amatriain). I haven’t seen anyone disappointed with this one. It’s a bit of math but self explanatory mostly.”
Sean McClure
McClure describes himself as the Founder of Kedion, PhD, Builder and host of NonTrivial Podcost. McLure says Kedion builds AI solutions for organizations looking to leverage their data for machine learning/AI applications. He previously was with Accenture as a Senior Manager in AI, Director of Data Science at Doximity and Space-Time Insight, and Senior Data Scientist at ThoughtWorks. He worked as a Research Associate for five years for the Canadian National Institute for Nanotechnology.
McClure’s advice from the Quora post on ‘How Do I Learn Machine Learning?’: “It is easy to get lost in all the languages and technologies that allow one to practice machine learning on real-world data. They allow us to execute our ideas and build our models. When integrated into real applications they engender software with the ability to learn and distill high-dimensional problems down to focused results. But languages and technology come and go. Knowing R or Python really well might amount to building a model faster or allow you to integrate it into software better, but it says nothing about your ability to choose the right model, or build one that truly speaks to the challenge at hand. The art of being able to do machine learning well comes from seeing the core concepts inside th algorithms and how they overlap withi the pain points trying to be addressed. Great practitioners start to see interesting overlaps before ever touching a keyboard.”
McClure recommends Building Machine Learning Systems with Python by Richert and Coelhos, Machine Learning with Spark by Pentreath and Learning from Data by Abu-Mostafa, Magon-Ismail and Lin.
Mayo reinforces McClure’s recommendation of the site www.datasciencetechnology.com. Mayo says it has top level categories such as learning algorithms, databases, data cleaning and languages as well as a broad and deep ontology of data science terms that are presented and explained with links to relevant resources.
KDNugget.com has a curated list of free data science books, collections of top data science materials and a large number of advice posts and tutorials if you are looking for more information to broaden and deepen your skill set.