An Introduction to Stooq Pricing Data

In the previous article we learnt how to setup a prototyping environment for algorithmic trading using Jupyter Notebooks. We used Yahoo data with Pandas DataReader. In this article we will be looking at another free market data provider Stooq.

If you would like to follow along with the tutorial and do not have the protoyping environment set up you will need:

Jupyter v1.0
Pandas v1.4
Pandas-DataReader v0.10
Historic daily data from the U.S region provided by Stooq

Stooq is a Polish website that has been partly translated into English. At the time of writing Stooq offer OHLCV data on 21,332 Global securities and ETFs, 1980 currency pairs and 132 crytpocurrencies. They also offer data on global indicies, commodites and bonds. Much like Yahoo, all the data on offer is accessed through a web interface and the OHLCV data can be download in CSV format. There is no API for Stooq. For stocks listed on US exchanges it is also possible to obtain some fundamentals such as price earnings and market value, although there are no historic downloads available for these. You can search for a ticker and download all current data relating to it via the symbol page or you can access current and historic data here.

A Note on Ticker Nomenclature

You can find or confirm a stock ticker by using the search bar provided at the top of the page. Here are a few of the most common changes to nomenclature:

Indices are prefixed by ^ e.g. ^DJI (Dow Jones Industrial), ^UK100 (FTSE100)
US stocks are suffixed by .US e.g. AAPL.US, MSFT.US, TSLA.US
Cryptocurrencies are suffixed by .V e.g. BTC.V (BitCoin)
UK stocks are suffixed by .UK e.g. AV.UK (Aviva)
Price/Earnings are suffixed by _PE.US e.g. AAPL_PE.US

Stooq Historic Data

Stooq offer historic daily, hourly, and minutely data on indicies, ETFs, stocks, bonds, forex and cryptocurrencies. In order to obtain the data you need to download by region and then by frequency. ETFs and Stocks are downloaded by specific region, whilst cyrptocurrencies, forex, indicies and bonds data are available within the region named world. Once you have selected your preferred data set by clicking on the links in the ASCII column you need to complete the CAPTCHA and confirm your download. You will receive a zip file containing the information for your selected time frequency. For this tutorial we will be using the daily data from the U.S region.

Historic data download page from Stooq.com — Stooq historic download page

Once unzipped the file size will vary depending on your region and frequency. Daily historic data for some US stocks can go back more than 30 years and the unzipped file takes up more the 1.4Gb of space so be sure to have enough storage space available. Hourly data provides up to the last 1400 data points available which equates to roughly 9 months depending on the security while the 5 minutely data provides upto the last 2000 data points for a security and equates to approximately 1 month.

Stooq sub-divide the data from each region to the exchange level then they split stocks and ETFs. For exchanges with a high volume of stocks the files are further sub-divided to allow a maximum of 2000 stocks per directory. This makes accessing the information more challenging, however the Stooq website provides information on the preffered directory structure for the download as shown in the figure below.

Preffered directory structure from Stooq.com — Stooq directory structure

Using the prototyping environment we set up with Jupyter notebooks in the previous article we will look at how to obtain data from a single stock and also for a list of stocks. To follow along with this tutorial you will need Jupyter notebooks, running Python 3.6 or greater, we will be using Pandas and Pandas DataReader.

Accessing a single stock

Once you have downloaded and unzipped the file from Stooq you will need to open Jupyter notebook. As discussed in the last article you will need to do this from your base anaconda environment. We previously installed and configured Ipykernel so you will be able to access the correct kernel for you virtual envronment directly through Jupyter. Create a new notebook and select the required kernel, if you are following along with the article series this will be called py3.8.

New kernel in Jupyter Notebook — Creating a new Jupyter Notebook with previously created kernel

We will begin by adding our imports, to begin with all we need to import is Pandas

import pandas as pd

It is a good idea at this point to familiarise yourself with the location of your downloaded data. You will need to input the path to the directory into Pandas. The Pandas read_csv() is an incredibly versitile function. It allows you to read .csv and .txt files directly into a DataFrame for further analysis. You can define a lot of the structure of your DataFrame within this function. You can select a row for column headers and a column to use as the index. You can skiprows or footers, limit the number of rows to be read, you can also infer datetimes. There is even the option to pass a dictionary of column names with values that are functions to apply to the columns. More information can be found here.

To begin with we will simply read the data into a DataFrame and do our post processing after the fact. In this case we will look at the data provided for AAPL. This should be located under the path .../data/daily/us/nasdaq stocks/1/aapl.us.txt. Once you have located the file all you need to do is pass it to read_csv(). Calling the Pandas method head() will allow you to view the DataFrame.

stooq_aapl = pd.read_csv("path/to/your/download/data/daily/us/nasdaq stocks/1/aapl.us.txt")
stooq_aapl.head()

Stooq AAPL data as DataFrame — Stooq data for single stock

We now have our OHLCV data as a DataFrame but we need to do some work to get it into a useable state. If we call stooq_aapl.dtypes we can see the datatype of each of our columns. You will notice that our date is stored as an integer. In order for us to do any time series analysis this should be converted to a datetime object. To convert it we can use pd.to_datetime(). However, this method takes a string as an input so we first need to cast the integer to a string using the method astype(). We can then use strftime formatting to create the datetime object. Now we can use the date column as our index. We also have a few columns that we don't need and the column labels or headers could be more readable. The following code will make the neccessary changes.

# First we drop the unwanted columns
stooq_aapl = stooq_aapl.drop(['', '', '', ''], axis=1)
# We relabel the column headers
stooq_aapl.columns = ["date","open", "high", "low", "close", "volume"]
# convert the date integer to a datetime object
stooq_aapl["date"] = pd.to_datetime(stooq_aapl['date'].astype(str), format = '%Y%m%d')
# set the date column as the index
stooq_aapl = stooq_aapl.set_index('date')

Our final DataFrame looks as follows and is ready for further analysis.

A Note on Close Price

It should be noted that the close price provided by Stooq is actually the adjusted close price. This can be seen by plotting the close price over a period of time which includes a stock split. In the case of Apple there was a 4-for-1 stock split on August 28th 2020. When this occurs the close price can be seen to drop dramatically overnight causing an almost vertical straight line to appear on the time series chart.

stooq_aapl.plot(y="close")

In the figure below you can see that there are no large drops around this time. This indicates that the close price has indeed been adjusted. One of the issues of relying on free data sources can often be determining whether adjustments have been made to the close price and how they have been carried out.

Plot of AAPL close price using Stooq Data — Stooq AAPL historic close price data.

Creating a Top Ten S&P500 Close Price DataFrame

Accessing data on a single stock is great but how do we access data on multiple stocks and analyse them all in one place? The structure of the Stooq download does make this a more complicated task. While you can generally tell which region or exchange a stock is located in you can't necessarily guarantee which subfolder it will be in. In the case of AAPL it was located under us/nasdaq stocks/1/ but it is prohibitive to spend time searching for stock tickers in all of the subfolders. Luckily for us Python 3.5+ comes with glob, a Unix style pathname pattern expansion. Glob allows us to recursively search through subdirectories on a given path for a particular file.

To demonstrate this we will create a DataFrame of close prices for the top ten constituents of the S&P500. We will create a dictionary of DataFrames where the keys are the symbols and the values are the DataFrames containing the historic OHLCV data. We will create a MultiIndex for each of the DataFrames with the date and ticker as levels. This will allow us to concatenate them together and unstack them to create a final DataFrame where the rows are indexed by dates and the columns are indexed by the chosen tickers. Let's get started.

We will be using three libraries Glob, OS and Pandas. OS and Glob are part of the Python 3 standard library so we will add the imports above the Pandas import and leave a line between the two. This helps differentiate between standard and third party imports and is best practice.

import glob
import os

import pandas as pd

Now that we have our imports we will define the ticker names of the top ten S&P500 constituents. In practice this could be any grouping of stocks you require as long as they are all contained within the US region in the Stooq download. We now create a tuple containing our stock tickers. The reason for this is that we plan to create a dictionary of DataFrames, a tuple is immutable so it ensures that the order of the tickers will not change. Although it is true that Python 3.7 provides dictionaries that maintain their order this behaviour is undocumented and may not be guaranteed. A dictionary is a key value pair mapping, as such it does not have an order. Using a tuple will make sure that we maintain the order of our tickers within the dictionary.

# Top 10 S&P 500 tickers
sp_10 = ('AAPL.US', 'MSFT.US', 'AMZN.US', 'GOOGL.US', 'GOOG.US', 'TSLA.US', 'NVDA.US', 'BRK-B.US', 'FB.US', 'UNH.US')

Next we create a list comprehension where we loop through each of the tickers in our tuple and call glob.glob() with os.path.join(). This generates a list of strings which represent the directory location of each of the data files. Finally, as glob returns a list we actually end up with a list of lists, with a single string in each sublist. We flatten the list of lists to a single list of strings so that we can create our dictionary of DataFrames.

# Use glob to find the data files for the tickers
path = "/path/to/your/Downloads/data/daily/us/**/"
ohlcv_data = [glob.glob(os.path.join(path, ticker + ".txt"), recursive=True) for ticker in sp_10]
# flatten glob list of lists
flat_ohlcv = [item for sublist in ohlcv_data for item in sublist]

Now that we have obtained all the locations for the files we can pass them to Pandas read_csv() as we did previously. This time we will use the keyword argument index_col to set the date column as our index. We will call the function within a dictionary comprehension and enumerate over the tickers in our top ten S&P constituent tuple to get the keys for the dictionary. The values will be the DataFrames.

# Create a dictionary of dataframes with index as date
stooq_data = {name: pd.read_csv(flat_ohlcv[i], index_col="") for i, name in enumerate(sp_10s)}

Now that we have our DataFrames we can create a MultiIndex with a datetime object and tickers as each of the levels. This will allow us to concatenate all the DataFrames and unstack the data creating a final DataFrame containing the Date and Close price for the top ten constituents of the S&P500. In order to achieve this we will remove any unwanted columns, convert the date integer to a datetime object and set the MultiIndex columns.

for k in stooq_data.keys():
# change date format
stooq_data[k].index = pd.to_datetime(stooq_data[k].index.astype(str), format="%Y-%m-%d")
# set MultiIndex
stooq_data[k].set_index([stooq_data[k].index, ""], inplace=True)
# drop all columns except close
stooq_data[k].drop(["", "", "", "", "", "", ""], axis=1, inplace=True)

Now we can concatenate our DataFrames using the .items() method making sure that we don't sort to preserve our ordering and we that use the join="outer" keyword. Both these behaviours are listed as the default parameters for the keywords but we specify them here to futureproof our code.

result = pd.concat([v for k, v in stooq_data.items()], join="outer", sort=False)

Finally we unstack our MultiIndex Dataframe to generate a column for the close price of all the tickers. As a default behaviour unstack() sorts data alphabetically by their labels, which means that our top ten constituents will no longer be in order of market cap. To reorder the columns we can create a MultiIndex from our original S&P top ten tuple.

final_df = result.unstack()

mindex_tup = [("", ticker) for ticker in sp_10]
mindex_cols = pd.MultiIndex.from_tuples(mindex_tup, names=['Price', 'Ticker'])
stooq_sp10_close = pd.DataFrame(final_df, columns=mindex_cols)

Using the Pandas pd.DataFrame.tail() command our final DataFrame looks as follows

Final DataFrame of S&P top 10 Close Price — Final DataFrame of Close Price for Top 10 S&P500 Constituents

Using a Library to import Stooq Data

The Python library Pandas-DataReader allows you to access Stooq data. Although this method is far simpler than those outlined above, at the time of writing it was only possible to access five years of historic data via the Data-Reader library. Let's look at an example.

import pandas_datareader.data as web
stooq_aapl = web.DataReader('AAPL', 'stooq')
stooq_aapl.iloc[[0,-1]]

Here we import the Data-Reader library as we have in previous tutorials, create a DataFrame for AAPL data and then use the Pandas df.iloc[] command to look at the first and last rows. As you can see the earliest date available is from April 2017, five years from time of writing.

Range of AAPL data using pandas-datareader with Stooq — First and Last rows of stooq_aapl DataFrame

Stooq Current data

Stooq also provide point in time cross sectional snap shots of their database. This takes the form of CSV downloads for daily data containing over 12,000 global securities, indicies and currencies. For hourly and 5 minutely data they offer price information for 66 currency pairs and 56 indicies. To access current data or data for a specific date you select the date and then specify different combinations of daily, hourly or 5 minutely data.

Summary

In this article we have introduced the Stooq historical market data vendor. We have looked at the available data, how to download it and reshape it to suit our requirements. There is an extensive offering of data available for download at different time frequencies. However, there is no API and the data files, once unzipped, take up a large amount of space. The Close price provided has been adjusted and, as far as we are aware, there is no separate unadjusted data. There are also some differences in ticker nomenclature compared to industry norms. If you are simply looking for data to experiment with Stooq offer a large history of diverse data that can be accessed and wrangled to suit most needs. In later articles we will be considering the data vendors AlphaVantage and Tiingo.

A Note on Ticker Nomenclature

Stooq Historic Data

Accessing a single stock

A Note on Close Price

Creating a Top Ten S&P500 Close Price DataFrame

Using a Library to import Stooq Data

Stooq Current data

Summary

Related Articles

The Quantcademy

Successful Algorithmic Trading

Advanced Algorithmic Trading