An Introduction to Machine Learning

What is Machine Learning?

And why is it useful for quant finance?

Machine learning makes use of algorithms that learn how to perform tasks such as prediction or classification without explicitly being programmed to do so. In essence, the algorithms learn from data rather than being prespecified.

Such algorithms are incredibly diverse and range from more traditional statistical models that emphasise inference through to highly complex hierarchical "deep" neural network architectures that excel at prediction and classification tasks.

Over the last ten years or so machine learning has been making steady gains in the quantitative finance sector and has aroused the interest of large quant funds including Man AHL, DE Shaw, Winton, Citadel and Two Sigma to name a few.

Machine learning algorithms can be applied in incredibly diverse ways for quantitative finance. Particular examples include:

  • Prediction of future asset price movements
  • Prediction of liquidity movements due to redemption of capital in large funds
  • Determination of mis-priced assets in niche markets
  • Natural language processing of equity analyst sentiment and forecasts
  • Image classification/recognition for use in commodity supply/demand signals

Unfortunately much of the work on applying machine learning algorithms to trading strategies in quant finance is proprietary and thus difficult to obtain. However, with practice it can be seen how to take certain datasets and find consistent alpha.

Machine learning is a broad area and is not a field that can be mastered quickly. The following resources will teach the basics, allowing you to dive deeper into specific areas:

"Machine learning algorithms learn from data to solve problems rather than using a prespecified set of rules."

Machine Learning Domains

What are the differing areas of study in machine learning?

Machine learning tasks are generally categorised into three main areas, which often depends on the type of data that is being analysed: Supervised learning, unsupervised learning and reinforcement learning.

The methods all differ in how the machine learning algorithm is "rewarded" for being correct in its predictions or classifications.

Supervised Learning - Supervised learning algorithms involve labelled data. That is, data that has been labelled, often manually, with categories (as in supervised classification) or with numerical responses (as in supervised regression). Such algorithms are trained on the data and learn which predictors correspond to which responses. When applied to unseen data they attempt to make predictions based on their prior training experience. An example from quantitative finance would be using supervised regression to predict tomorrow's stock price from the previous month's worth of price data.

Unsupervised Learning - Unsupervised learning algorithms do not make use of labelled data. Instead they utilise the underlying structure of the data to identify patterns. The canonical method is unsupervised clustering, which attempts to partition datasets into sub-clusters that are associated in some manner. An example from quantitative finance would be to cluster certain assets into classes that behave similarly to adjust portfolio allocations. Read more about Unsupervised Learning here.

Reinforcement Learning - Reinforcement learning algorithms attempt to perform a task within a certain dynamic environment, by taking actions inside the environment in order to maximise a reward mechanism. These algorithms differ from supervised learning in that there is no direct set of input/output pairs of data. Such algorithms have become famous recently as they have been used by Google DeepMind to exceed human performance in Atari games and the ancient game of Go. Such algorithms have been applied in quant finance to optimise investment portfolios.

"Machine learning domains include supervised learning, unsupervised learning and reinforcement learning."

Machine Learning Algorithms

What are the different algorithms?

Due to its interdisciplinary nature there are a large number of differing machine learning algorithms. Most have arisen from the computer science, engineering and statistics communities.

The list of machine learning algorithms is almost endless, as they include crossover techniques and ensembles of many other algorithms. However, the algorithms frequently used within quantitative finance are listed below:

  • Linear Regression - An elementary supervised technique from classical statistics that finds an optimal linear response surface from a set of labelled predictor-response pairs.
  • Linear Classification - These supervised techniques classify data into groups, rather than predict numerical responses. Common techniques include Logistic Regression, Linear Discriminant Analysis and Naive Bayes Classification.
  • Tree-Based Methods - Decision trees are a supervised technique that partition the predictor/feature space into hypercubic subsets. Ensembles of decision trees include Random Forests.
  • Support Vector Machines - SVMs are a supervised technique that attempts to create a linear separation boundary in higher-dimensional space than the original problem in order to deal with non-linear separation.
  • Artificial Neural Networks/Deep Learning - Neural networks are a supervised technique that create hierarchies of activation "neurons" that can approximate high-dimensional non-linear functions. "Deep" networks make use of many hidden layers of neurons to form hierarchical representations for state-of-the-art classification performance.
  • Bayesian Networks - Bayesian Networks or "Bayes Nets" are a type of probabilistic graphical model that represent probabilistic relationships between variables. They are utilised both for inference and learning applications.
  • Clustering - Clustering is an unsupervised technique that attempts to partition data into subsets according to some similarity criteria. A common technique is K-Means Clustering.
  • Dimensionality Reduction - Dimensionality reduction algorithms are unsupervised techniques that attempt to transform the space of predictors/factors into another set that explain the "variation" in the responses with fewer dimensions. Principal Components Analysis is the canonical technique here.

Determining the "best tool for the job" is one of the trickiest aspects of machine learning as applied to quant finance. Many articles on QuantStart discuss this particular point and will guide you to applying the correct technique where appropriate.

Visit the Statistical Modelling and Machine Learning section to continue reading.

Just Getting Started with Quantitative Trading?

3 Reasons to Subscribe to the QuantStart Email List:

No Thanks, I'll Pass For Now

1. Quant Trading Lessons

You'll get instant access to a free 10-part email course packed with hints and tips to help you get started in quantitative trading!

2. All The Latest Content

Every week I'll send you a wrap of all activity on QuantStart so you'll never miss a post again.

3. No Spam

Real, actionable quant trading tips with no nonsense.