Top 5 Essential Books for Python Machine Learning

Top 5 Essential Books for Python Machine Learning

We've discussed the importance of statistical modelling and machine learning in various articles on QuantStart. Machine learning is particularly important if one is interested in becoming a quantitative trading researcher. In this article I want to highlight some books that discuss machine learning from a programmatic perspective, rather than a mathematical one. This route is more appropriate for the quantitative developer or traditional software developer who wishes to eventually break into quantitative trading.

The following books all make use of Python as the primary progamming language. Some discuss scikit-learn, which is considered to be the predominant machine learning library for Python.

1) Programming Collective Intelligence: Building Smart Web 2.0 Applications - Toby Segaran

This was actually my first proper introduction to machine learning in Python. I have a copy of the first edition of this book and originally used it for the consumer analytics applications it discusses. This book is really suited to those who wish to see exactly how machine learning algorithms are implemented (in pure Python) as opposed to being taught how to use a particular library.

The book covers a wide variety of topics and domains. In particular there are sections on Recommendation, Clustering, Searching/Ranking, Optimisation, Decision Trees, Support Vector Machines, Feature Detection and Genetic Programming. Despite the fact that this book is less directly related to quantitative finance I believe it is one of the best here to learn the process of machine learning. It is definitely worth picking up.

2) Building Machine Learning Systems with Python - Willi Richert, Luis Pedro Coelho

This book goes into significant detail on how to use scikit-learn for regression and classification tasks. In addition to extensive coverage on scikit-learn it actually considers other libraries such as gensim (for topic modelling). The book spends a reasonable amount of time looking at text-based classification and sentiment analysis, which is becoming a hot topic in quantitative trading, as individuals and funds attempt to form strategies that can trade based on social media sentiment.

The book also considers regression in a recommendation scenario, which while interesting in its own right, is probably more applicable to data scientists and consumer analytics engineers.

3) Learning scikit-learn: Machine Learning in Python - Raúl Garreta, Guillermo Moncecchi

This is a quite a short book compared to some of the others. I would recommend this one to individuals who are comfortable coding in Python and have had some basic exposure to NumPy and Pandas, but want to get into machine learning quickly. It covers somewhat more than the scikit-learn documentation, but doesn't really differentiate between the mathematical components of each algorithm and is thus a bit more like a basic machine learning recipe book! However, this can be appealing to those who just want to "dive in".

4) Machine Learning in Action - Peter Harrington

This book is split into three main areas - supervised classification, supervised regression and unsupervised methods (such as dimensionality reduction). It goes into a lot of detail about these topics, with comparisons across many different algorithms. The book is somewhat more mathematically oriented than the previous books discussed above so this may appeal to Python programmers who have an applied mathematics background.

The book also considers the emerging field of "big data" by introducing Hadoop, MapReduce and Amazon Web Services (AWS). This may be appropriate to some quant finance firms that also utilise consumer or internet-based data in order to carry out their trading algorithms.

5) Machine Learning: An Algorithmic Perspective - Stephen Marsland

This book is on the more mathematically oriented end of the Python machine learning spectrum. It covers topics not discussed by the previous books such as Neural Networks, Hidden Markov Models and Markov Chain Monte Carlo. Despite the mathematical approach there is still plenty of Python code and thus the book can read "at the computer".

While the book covers a lot of ground mathematically, it is likely you will need to complement it with a book on statistical methods such as Elements of Statistical Learning. You will also need to have a basic understanding of Bayesian statistics, since a lot of the methods in this book touch on this area.