Downloading Historical Futures Data From Quandl

Futures contracts are ubiquitous in quantitative trading and have yet to be discussed in any great detail on QuantStart. This will be the first in a series of articles explaining how to download, store, clean and stitch futures data for use in your trading systems.

Up until recently it was rather difficult and expensive to obtain consistent futures data across exchanges in frequently updated manner. That has all changed recently with the release of Quandl. It's a fantastic source of free financial data. They have a substantial library of daily futures prices in some cases going back to the 1950s!

Not only is the service a robust library of data but their API allows export in CSV, HTML, JSON and XML. In addition there are bindings for Python, R, Excel, MatLab, Stata, C++, Java and many other languages/packages. Without understatement this has made obtaining a wide range of daily financial data incredibly straightforward. In this article we are going to make use of Python (predominantly NumPy and pandas) to obtain futures data from Quandl and store it to disk.

In order to get the most out of this tutorial you will need to have pandas successfully installed. In addition you may want to utilise an interactive development environment like Canopy or the IPython console. I've written a tutorial/video on how to go about this from scratch.

Disclaimer: I have not been approached by Quandl in regard to writing this article. I am just a big fan of what they're trying to achieve and want to let you all know how useful their service is. Plus, it's completely free and works out-of-the-box with nearly any major programming language out there. What's not to like?

Signing Up For Quandl

The first thing we are going to do is sign up to Quandl so that we can increase our daily allowance of calls to their API. Sign-up grants you 500 calls per day, rather than the default 50. Let's begin by visiting the site at www.quandl.com:


The Quandl homepage

Click on the sign-up button on the top right:


The Quandl sign-up page

Once you're signed in you'll be returned to the home page:


The Quandl authorised home page

Quandl Futures Data

Now click on the "New: Futures page..." link to get to the futures homepage:


The Quandl futures contracts home page

For this tutorial we're interested in one of the most liquid financial instruments available, the E-Mini S&P500 futures contract, which has the futures symbol ES.

If you want to download any of the other contracts instead then you can obviously replace the remainder of the tutorial with your chosen symbol and everything should "just work"!

Click on the E-Mini S&P500 link (or your chosen futures symbol) and you'll be taken to the following screen:


E-Mini S&P500 contract page

If you scroll down you'll see the list of historical contracts going back to 1997:


E-Mini S&P500 historical contracts

We can now click on one of the individual contracts. I've picked ESZ2014, which refers to the contract for December 2014 'delivery'. You'll see a handy chart of the data:


Chart of ESZ2014 (December 2014 delivery)

By clicking on the "Download" button the data can be obtained in multiple formats: HTML, CSV, JSON or XML. In addition we can download the data directly into a pandas DataFrame using the Python bindings. While the latter is useful for quick "prototyping" and exploration of the data, we're more interested in building a longer term data store. We're especially interested in creating continuous time series of futures contracts (which will be the subject of the next article). Go ahead and click the download button, select "CSV" and then copy & paste the API call:


Download modal for ESZ2014 CSV file

The API call will look something like this:

http://www.quandl.com/api/v1/datasets/OFDP/FUTURE_ESZ2014.csv?&auth_token={{MY_AUTH_TOKEN}}&trim_start=2013-09-18&trim_end=2013-12-04&sort_order=desc

Where I've redacted my authorisation token and replaced it with {{MY_AUTH_TOKEN}}. You will need to copy the alphanumeric string between "auth_token=" and "&trim_start" for later usage in the Python script below. Do not share it with anyone as it is your unique authorisation token for Quandl downloads and is used to determine your download rate for the day.

This API call will form the basis of an automated script which we will write below to download a subset of the entire historical futures contract.

Downloading Quandl Futures into Python

Because we are interested in using the futures data long-term as part of a wider securities master database strategy we want to store the futures data to disk. Thus we need to create a directory to hold our E-Mini contract CSV files. In Mac/Linux (within the terminal/console) this is achieved by the following command:

cd ~
mkdir -p quandl/futures/ES

Note: You can obviously choose a different directory structure for your data needs, but I've gone with a simple approach of putting it underneath the Linux/Mac "home" directory.

This creates a subdirectory of your home directory called quandl, which contains two further subdirectories for futures and for the ES contracts in particular. This will help us to organise our downloads in an ongoing fashion.

In order to carry out the download using Python we will need to import some libraries. In particular we will need urllib2 for the download and pandas for plotting/manipulation. If you haven't installed pandas before, then please follow my tutorial on installing an algorithmic trading research environment in Linux first. Let's begin our Python script by importing the two libraries:

import pandas as pd
import urllib2

The first function within the code will generate the list of futures symbols we wish to download. I've added keyword parameters for the start and end years, setting them to reasonable values of 2010 and 2014. You can, of course, choose to use other timeframes:

def construct_futures_symbols(symbol, start_year=2010, end_year=2014):
    """Constructs a list of futures contract codes for a 
    particular symbol and timeframe."""
    futures = []
    months = 'HMUZ'  # March, June, September and December delivery codes
    for y in range(start_year, end_year+1):
        for m in months:
            futures.append("%s%s%s" % (symbol, m, y))
    return futures

Now we need to loop through each symbol, obtain the CSV file from Quandl for that particular contract and subsequently write it to disk so we can access it later:

def download_contract_from_quandl(contract, auth_token, dl_dir):
    """Download an individual futures contract from Quandl and then
    store it to disk in the 'dl_dir' directory. An auth_token is 
    required, which is obtained from the Quandl upon sign-up."""

    # Construct the API call from the contract and auth_token    
    api_call_head = "http://www.quandl.com/api/v1/datasets/OFDP/FUTURE_%s.csv" % contract
    params = "?&auth_token=%s&sort_order=asc" % auth_token
    
    # Download the data from Quandl
    data = urllib2.urlopen("%s%s" % (api_call_head, params)).read()
    
    # Store the data to disk
    fc = open('%s/%s.csv' % (dl_dir, contract), 'w')
    fc.write(data)
    fc.close()

Now we tie the above two functions together to download all of the desired contracts:

def download_historical_contracts(symbol, auth_token, dl_dir, start_year=2010, end_year=2014):
    """Downloads all futures contracts for a specified symbol
    between a start_year and an end_year."""
    contracts = construct_futures_symbols(symbol, start_year, end_year)
    for c in contracts:
        download_contract_from_quandl(c, auth_token, dl_dir)

Finally, we can add one of the futures prices to a pandas dataframe using the main function:

if __name__ == "__main__":
    symbol = 'ES'
    dl_dir = 'quandl/futures/ES'  # Make sure you've created this relative directory beforehand
    auth_token = 'YOUR_AUTH_TOKEN'  # Replace this with your authorisation token
    start_year = 2010
    end_year = 2014

    # Download the contracts into the directory
    download_historical_contracts(symbol, auth_token, dl_dir, start_year, end_year)

    # Open up a single contract via read_csv and plot the closing price
    es = pd.io.parsers.read_csv("%s/ESZ2014.csv" % dl_dir)

If we were running this script in IPython or Canopy then we could simply plot the settle (close price) data inline:

In [1]: plot(es['Settle'])
Out [1]:
...

Here is the output from Canopy/IPython:


ESZ2014 Settle/Close price chart from Canopy/IPython

You can use this code to collect any combination of futures contracts from Quandl as you see fit. Remember though that unless you request a higher API rate limit, you'll be stuck with 500 requests per day (or 50 if you don't sign up).

In the next article we will go into an in-depth discussion of creating a continuous futures contract from a set of individual contracts for separate delivery dates. This will be necessary to carry out statistical analysis and backtesting research over long-time frames.

Edit: Andy E. pointed out that I had a mistake in the download_contract_from_quandl(c, auth_token, dl_dir) line above. I've corrected it now.

comments powered by Disqus

Just Getting Started with Quantitative Trading?

3 Reasons to Subscribe to the QuantStart Email List:

No Thanks, I'll Pass For Now

1. Quant Trading Lessons

You'll get instant access to a free 10-part email course packed with hints and tips to help you get started in quantitative trading!

2. All The Latest Content

Every week I'll send you a wrap of all activity on QuantStart so you'll never miss a post again.

3. No Spam

Real, actionable quant trading tips with no nonsense.