Beginner's Guide to Quantitative Trading
In this article I'm going to introduce you to some of the basic concepts which accompany an end-to-end quantitative trading system. This post will hopefully serve two audiences. The first will be individuals trying to obtain a job at a fund as a quantitative trader. The second will be individuals who wish to try and set up their own "retail" algorithmic trading business.
Quantitative trading is an extremely sophisticated area of quant finance. It can take a significant amount of time to gain the necessary knowledge to pass an interview or construct your own trading strategies. Not only that but it requires extensive programming expertise, at the very least in a language such as MATLAB, R or Python. However as the trading frequency of the strategy increases, the technological aspects become much more relevant. Thus being familiar with C/C++ will be of paramount importance.
A quantitative trading system consists of four major components:
- Strategy Identification - Finding a strategy, exploiting an edge and deciding on trading frequency
- Strategy Backtesting - Obtaining data, analysing strategy performance and removing biases
- Execution System - Linking to a brokerage, automating the trading and minimising transaction costs
- Risk Management - Optimal capital allocation, "bet size"/Kelly criterion and trading psychology
We'll begin by taking a look at how to identify a trading strategy.
All quantitative trading processes begin with an initial period of research. This research process encompasses finding a strategy, seeing whether the strategy fits into a portfolio of other strategies you may be running, obtaining any data necessary to test the strategy and trying to optimise the strategy for higher returns and/or lower risk. You will need to factor in your own capital requirements if running the strategy as a "retail" trader and how any transaction costs will affect the strategy.
Contrary to popular belief it is actually quite straightforward to find profitable strategies through various public sources. Academics regularly publish theoretical trading results (albeit mostly gross of transaction costs). Quantitative finance blogs will discuss strategies in detail. Trade journals will outline some of the strategies employed by funds.
You might question why individuals and firms are keen to discuss their profitable strategies, especially when they know that others "crowding the trade" may stop the strategy from working in the long term. The reason lies in the fact that they will not often discuss the exact parameters and tuning methods that they have carried out. These optimisations are the key to turning a relatively mediocre strategy into a highly profitable one. In fact, one of the best ways to create your own unique strategies is to find similar methods and then carry out your own optimisation procedure.
Here is a small list of places to begin looking for strategy ideas:
- Social Science Research Network - www.ssrn.com
- arXiv Quantitative Finance - arxiv.org/archive/q-fin
- Seeking Alpha - www.seekingalpha.com
- Elite Trader - www.elitetrader.com
- Nuclear Phynance - www.nuclearphynance.com
- Quantivity - quantivity.wordpress.com
Many of the strategies you will look at will fall into the categories of mean-reversion and trend-following/momentum. A mean-reverting strategy is one that attempts to exploit the fact that a long-term mean on a "price series" (such as the spread between two correlated assets) exists and that short term deviations from this mean will eventually revert. A momentum strategy attempts to exploit both investor psychology and big fund structure by "hitching a ride" on a market trend, which can gather momentum in one direction, and follow the trend until it reverses.
Another hugely important aspect of quantitative trading is the frequency of the trading strategy. Low frequency trading (LFT) generally refers to any strategy which holds assets longer than a trading day. Correspondingly, high frequency trading (HFT) generally refers to a strategy which holds assets intraday. Ultra-high frequency trading (UHFT) refers to strategies that hold assets on the order of seconds and milliseconds. As a retail practitioner HFT and UHFT are certainly possible, but only with detailed knowledge of the trading "technology stack" and order book dynamics. We won't discuss these aspects to any great extent in this introductory article.
Once a strategy, or set of strategies, has been identified it now needs to be tested for profitability on historical data. That is the domain of backtesting.
The goal of backtesting is to provide evidence that the strategy identified via the above process is profitable when applied to both historical and out-of-sample data. This sets the expectation of how the strategy will perform in the "real world". However, backtesting is NOT a guarantee of success, for various reasons. It is perhaps the most subtle area of quantitative trading since it entails numerous biases, which must be carefully considered and eliminated as much as possible. We will discuss the common types of bias including look-ahead bias, survivorship bias and optimisation bias (also known as "data-snooping" bias). Other areas of importance within backtesting include availability and cleanliness of historical data, factoring in realistic transaction costs and deciding upon a robust backtesting platform. We'll discuss transaction costs further in the Execution Systems section below.
Once a strategy has been identified, it is necessary to obtain the historical data through which to carry out testing and, perhaps, refinement. There are a significant number of data vendors across all asset classes. Their costs generally scale with the quality, depth and timeliness of the data. The traditional starting point for beginning quant traders (at least at the retail level) is to use the free data set from Yahoo Finance. I won't dwell on providers too much here, rather I would like to concentrate on the general issues when dealing with historical data sets.
The main concerns with historical data include accuracy/cleanliness, survivorship bias and adjustment for corporate actions such as dividends and stock splits:
- Accuracy pertains to the overall quality of the data - whether it contains any errors. Errors can sometimes be easy to identify, such as with a spike filter, which will pick out incorrect "spikes" in time series data and correct for them. At other times they can be very difficult to spot. It is often necessary to have two or more providers and then check all of their data against each other.
- Survivorship bias is often a "feature" of free or cheap datasets. A dataset with survivorship bias means that it does not contain assets which are no longer trading. In the case of equities this means delisted/bankrupt stocks. This bias means that any stock trading strategy tested on such a dataset will likely perform better than in the "real world" as the historical "winners" have already been preselected.
- Corporate actions include "logistical" activities carried out by the company that usually cause a step-function change in the raw price, that should not be included in the calculation of returns of the price. Adjustments for dividends and stock splits are the common culprits. A process known as back adjustment is necessary to be carried out at each one of these actions. One must be very careful not to confuse a stock split with a true returns adjustment. Many a trader has been caught out by a corporate action!
In order to carry out a backtest procedure it is necessary to use a software platform. You have the choice between dedicated backtest software, such as Tradestation, a numerical platform such as Excel or MATLAB or a full custom implementation in a programming language such as Python or C++. I won't dwell too much on Tradestation (or similar), Excel or MATLAB, as I believe in creating a full in-house technology stack (for reasons outlined below). One of the benefits of doing so is that the backtest software and execution system can be tightly integrated, even with extremely advanced statistical strategies. For HFT strategies in particular it is essential to use a custom implementation.
When backtesting a system one must be able to quantify how well it is performing. The "industry standard" metrics for quantitative strategies are the maximum drawdown and the Sharpe Ratio. The maximum drawdown characterises the largest peak-to-trough drop in the account equity curve over a particular time period (usually annual). This is most often quoted as a percentage. LFT strategies will tend to have larger drawdowns than HFT strategies, due to a number of statistical factors. A historical backtest will show the past maximum drawdown, which is a good guide for the future drawdown performance of the strategy. The second measurement is the Sharpe Ratio, which is heuristically defined as the average of the excess returns divided by the standard deviation of those excess returns. Here, excess returns refers to the return of the strategy above a pre-determined benchmark, such as the S&P500 or a 3-month Treasury Bill. Note that annualised return is not a measure usually utilised, as it does not take into account the volatility of the strategy (unlike the Sharpe Ratio).
Once a strategy has been backtested and is deemed to be free of biases (in as much as that is possible!), with a good Sharpe and minimised drawdowns, it is time to build an execution system.
An execution system is the means by which the list of trades generated by the strategy are sent and executed by the broker. Despite the fact that the trade generation can be semi- or even fully-automated, the execution mechanism can be manual, semi-manual (i.e. "one click") or fully automated. For LFT strategies, manual and semi-manual techniques are common. For HFT strategies it is necessary to create a fully automated execution mechanism, which will often be tightly coupled with the trade generator (due to the interdependence of strategy and technology).
The key considerations when creating an execution system are the interface to the brokerage, minimisation of transaction costs (including commission, slippage and the spread) and divergence of performance of the live system from backtested performance.
There are many ways to interface to a brokerage. They range from calling up your broker on the telephone right through to a fully-automated high-performance Application Programming Interface (API). Ideally you want to automate the execution of your trades as much as possible. This frees you up to concentrate on further research, as well as allow you to run multiple strategies or even strategies of higher frequency (in fact, HFT is essentially impossible without automated execution). The common backtesting software outlined above, such as MATLAB, Excel and Tradestation are good for lower frequency, simpler strategies. However it will be necessary to construct an in-house execution system written in a high performance language such as C++ in order to do any real HFT. As an anecdote, in the fund I used to be employed at, we had a 10 minute "trading loop" where we would download new market data every 10 minutes and then execute trades based on that information in the same time frame. This was using an optimised Python script. For anything approaching minute- or second-frequency data, I believe C/C++ would be more ideal.
In a larger fund it is often not the domain of the quant trader to optimise execution. However in smaller shops or HFT firms, the traders ARE the executors and so a much wider skillset is often desirable. Bear that in mind if you wish to be employed by a fund. Your programming skills will be as important, if not more so, than your statistics and econometrics talents!
Another major issue which falls under the banner of execution is that of transaction cost minimisation. There are generally three components to transaction costs: Commissions (or tax), which are the fees charged by the brokerage, the exchange and the SEC (or similar governmental regulatory body); slippage, which is the difference between what you intended your order to be filled at versus what it was actually filled at; spread, which is the difference between the bid/ask price of the security being traded. Note that the spread is NOT constant and is dependent upon the current liquidity (i.e. availability of buy/sell orders) in the market.
Transaction costs can make the difference between an extremely profitable strategy with a good Sharpe ratio and an extremely unprofitable strategy with a terrible Sharpe ratio. It can be a challenge to correctly predict transaction costs from a backtest. Depending upon the frequency of the strategy, you will need access to historical exchange data, which will include tick data for bid/ask prices. Entire teams of quants are dedicated to optimisation of execution in the larger funds, for these reasons. Consider the scenario where a fund needs to offload a substantial quantity of trades (of which the reasons to do so are many and varied!). By "dumping" so many shares onto the market, they will rapidly depress the price and may not obtain optimal execution. Hence algorithms which "drip feed" orders onto the market exist, although then the fund runs the risk of slippage. Further to that, other strategies "prey" on these necessities and can exploit the inefficiencies. This is the domain of fund structure arbitrage.
The final major issue for execution systems concerns divergence of strategy performance from backtested performance. This can happen for a number of reasons. We've already discussed look-ahead bias and optimisation bias in depth, when considering backtests. However, some strategies do not make it easy to test for these biases prior to deployment. This occurs in HFT most predominantly. There may be bugs in the execution system as well as the trading strategy itself that do not show up on a backtest but DO show up in live trading. The market may have been subject to a regime change subsequent to the deployment of your strategy. New regulatory environments, changing investor sentiment and macroeconomic phenomena can all lead to divergences in how the market behaves and thus the profitability of your strategy.
The final piece to the quantitative trading puzzle is the process of risk management. "Risk" includes all of the previous biases we have discussed. It includes technology risk, such as servers co-located at the exchange suddenly developing a hard disk malfunction. It includes brokerage risk, such as the broker becoming bankrupt (not as crazy as it sounds, given the recent scare with MF Global!). In short it covers nearly everything that could possibly interfere with the trading implementation, of which there are many sources. Whole books are devoted to risk management for quantitative strategies so I wont't attempt to elucidate on all possible sources of risk here.
Risk management also encompasses what is known as optimal capital allocation, which is a branch of portfolio theory. This is the means by which capital is allocated to a set of different strategies and to the trades within those strategies. It is a complex area and relies on some non-trivial mathematics. The industry standard by which optimal capital allocation and leverage of the strategies are related is called the Kelly criterion. Since this is an introductory article, I won't dwell on its calculation. The Kelly criterion makes some assumptions about the statistical nature of returns, which do not often hold true in financial markets, so traders are often conservative when it comes to the implementation.
Another key component of risk management is in dealing with one's own psychological profile. There are many cognitive biases that can creep in to trading. Although this is admittedly less problematic with algorithmic trading if the strategy is left alone! A common bias is that of loss aversion where a losing position will not be closed out due to the pain of having to realise a loss. Similarly, profits can be taken too early because the fear of losing an already gained profit can be too great. Another common bias is known as recency bias. This manifests itself when traders put too much emphasis on recent events and not on the longer term. Then of course there are the classic pair of emotional biases - fear and greed. These can often lead to under- or over-leveraging, which can cause blow-up (i.e. the account equity heading to zero or worse!) or reduced profits.
As can be seen, quantitative trading is an extremely complex, albeit very interesting, area of quantitative finance. I have literally scratched the surface of the topic in this article and it is already getting rather long! Whole books and papers have been written about issues which I have only given a sentence or two towards. For that reason, before applying for quantitative fund trading jobs, it is necessary to carry out a significant amount of groundwork study. At the very least you will need an extensive background in statistics and econometrics, with a lot of experience in implementation, via a programming language such as MATLAB, Python or R. For more sophisticated strategies at the higher frequency end, your skill set is likely to include Linux kernel modification, C/C++, assembly programming and network latency optimisation.
If you are interested in trying to create your own algorithmic trading strategies, my first suggestion would be to get good at programming. My preference is to build as much of the data grabber, strategy backtester and execution system by yourself as possible. If your own capital is on the line, wouldn't you sleep better at night knowing that you have fully tested your system and are aware of its pitfalls and particular issues? Outsourcing this to a vendor, while potentially saving time in the short term, could be extremely expensive in the long-term.
- Backtesting a Forecasting Strategy for the S&P500 in Python with pandas
- Backtesting a Moving Average Crossover in Python with pandas
- Backtesting An Intraday Mean Reversion Pairs Strategy Between SPY And IWM
- Basics of Statistical Mean Reversion Testing
- Basics of Statistical Mean Reversion Testing - Part II
- Beginner's Guide to Statistical Machine Learning - Part I
- Best Programming Language for Algorithmic Trading Systems?
- Can Algorithmic Traders Still Succeed at the Retail Level?
- Choosing a Platform for Backtesting and Automated Execution
- Continuous Futures Contracts for Backtesting Purposes
- Downloading Historical Futures Data From Quandl
- Downloading Historical Intraday US Equities From DTN IQFeed with Python
- Event-Driven Backtesting with Python - Part I
- Event-Driven Backtesting with Python - Part II
- Event-Driven Backtesting with Python - Part III
- Event-Driven Backtesting with Python - Part IV
- Event-Driven Backtesting with Python - Part V
- Event-Driven Backtesting with Python - Part VI
- Event-Driven Backtesting with Python - Part VII
- Event-Driven Backtesting with Python - Part VIII
- Forecasting Financial Time Series - Part I
- How to Identify Algorithmic Trading Strategies
- Installing a Desktop Algorithmic Trading Research Environment using Ubuntu Linux and Python
- Interactive Brokers Demo Account Signup Tutorial
- Money Management via the Kelly Criterion
- My Interview Over At OneStepRemoved.com
- My Talk At The London Financial Python User Group
- Research Backtesting Environments in Python with pandas
- Securities Master Database with MySQL and Python
- Securities Master Databases for Algorithmic Trading
- Self-Study Plan for Becoming a Quantitative Trader - Part II
- Sharpe Ratio for Algorithmic Trading Performance Measurement
- Successful Backtesting of Algorithmic Trading Strategies - Part I
- Successful Backtesting of Algorithmic Trading Strategies - Part II
- Support Vector Machines: A Guide for Beginners
- Top 5 Essential Beginner Books for Algorithmic Trading
- Using Python, IBPy and the Interactive Brokers API to Automate Trades
- Value at Risk (VaR) for Algorithmic Trading Risk Management - Part I
If so, please join QuantStart's growing list of people who receive FREE exclusive weekly algorithmic trading, quantitative finance and programming help. Just enter your name and email below: