Updated January 2023. At time of writing Pandsa DataReader no longer supports Yahoo data due to an API change. This article has been updated to reflect the change.
In the early days of QuantStart we posted an article on setting up an Algorithmic Trading Research Environment with Ubuntu Linux and Python. In 2013 when the article was first written, installing Python was not a trivial task. Problems with GCC compilers, cross dependencies between libraries and operating system intricacies all played a role in making the job of installing Python much harder than it needed to be. These days the problem is largely solved. In fact there are now so many options for installing Python that it is easy to get confused.
There are many different approaches you can take to installing Python, and there are plenty of contradictory opinions on the best appraoch. With that in mind it is better to choose the method based on how you intend to use the programming language. If you plan to use Python to explore algorithmic trading then this article will show you how to get an environment up and running in the simplest way. If you are familiar with programming and installing software then you might prefer to install the Official Python Distribution. There is an excellent tutorial on for this method here.
Currently we recommend using the Anaconda Python distribution by Continuum Analytics. The main reasons for this are discussed below.
- Anaconda comes with everything you need to get started analysing your data.
To quote their website Anaconda is a Python and R distribution that aims to provide everything you need (python wise) for data science tasks. - Anaconda comes with Conda.
Conda is a package manager that allows you to install, upgrade and uninstall all your Python libraries. It can install from pre-built conda packages and it can build from source code. Conda also allows you to create and manage your virtual environments. - Anaconda works well with Jupyter Notebooks.
By using IPyKernel you can quickly and easily hook up your virtual environments to your notebooks.
When you install Ananconda you get immediate access to over 1500 Python libraries including NumPy, SciPy, Pandas, Beautiful Soup and Requests. As you will see in later tutorials you can even control the versions of these libraries by creating your own virtual environments. Some of the criticisms of Anaconda have been that it is bloated, not all of the packages are relevant and it takes up too much space. If you would prefer a more streamlined version Continuum Analytics offers Miniconda which gives you access to Python and the Conda package manager, but you will have to install all the libraries yourself. If you have limited disk space and feel this is a better option for you there is a good tutorial on installing Miniconda here.
Installing Anaconda
This post is part of a series on how to install the Anaconda Python distribution on different operating systems. In this post we will discuss how to install Anaconda3 version 2021.11 (Python 3.9), on Mac OS X. This will require 530 Mb of free space. Please ensure you have that much room available before you begin. Other posts in the series concentrate on installation with:
Installing Anaconda on Mac OS X
Open up your web browser and head to the following address: https://www.anaconda.com/products/individual The website should determine the correct download for your system.
Click the green download button. Your download should begin immediately. Once the download is complete the installer will open and and you will be taken to the Anaconda setup wizard.
You will need to agree to the license agreement to continue the install
Select how you wish to install Anaconda, we recommend installing just for the current user, and click next. You will then be asked to confirm your download location. We recommend leaving this as the default. The download will take 530Mb of space.
Once completed you should see a screen providing you with a link to download PyCharm, an Python IDE. You can download this at any time if you wish but it is not essential. Clicking next brings you to the final screen where you can click finish to complete the setup.
In order to check Python has been correctly installed we will open up the terminal. If you are unfamiliar with this application go to the search function and type terminal. This should bring up the following options.
Select the Terminal. This will bring up the command line prompt or shell. Type python
into the prompt and press enter. You should see a couple of lines of text telling you which version of Python you are running followed by three chevrons (>>>), this is the Python prompt, it indicates that you can enter Python commands. You are now in a Python console and can begin coding in Python.
Try typing import pandas as pd
into the prompt. After pressing enter you will see that nothing has changed. you will be presented with a new line contianing the Python prompt. If you type pd.__version__
and press enter you can find the version of the Pandas library you have just imported into the Anaconda prompt.
To exit python and return to the Anaconda prompt you can simply type exit()
. You are now back in the command line prompt.
Creating your first virtual environment
Once you have been using Python for a while or across multiple different projects you will quickly run into the issue of dependencies. A script you have written or a project you are working on may require you to use features that are available in the latest version of a Python library like Pandas, but you have other projects or scripts that use older versions. How do you manage and maintain your Python environment to allow you to run and work on both scripts or projects? The answer is to use a virtual environment.
A virtual environment is an isolated Python environment that has its own dependencies, or in other words, its own versions of libraries and packages. Virtual environments can be created for each of your projects so that you can use whatever versions of libraries are necessary for each one. With Anaconda you can also specify versions of Python when you create them.
One of the benefits of Anaconda is that it comes with the package manager Conda, which allows you to create virtual environments easily. Anaconda currently allows you to create virtual environments for Python 2.7, 3.5, 3.6, 3.7, 3.8 and 3.9. Most package versions can be found using conda or conda-forge or, as a last resort you can use the python package manager pip. If you have used pip to install your libraries within your conda environments they will be installed into a different channel and you will not be able to uprade them using the command conda upgrade
. If you prefer not to use anaconda as your Python distribution and have installed Python directly from the Official Python Distribution, this same task can be accomplished using pyenv to obtain multiple versions of Python and pipenv or virtualenv to manage virtual environments. A good tutorial on this can be found here.
In the prompt you will notice that there is a prefix in brackets before the directory information about the user. This appears as (base)
and indicates that you are in the base anaconda environment. Here you have access to all the packages that were downloaded and installed by Anaconda and if you type python --version
into the prompt you will see that you are running the default version of Python in this case Python 3.9.7.
We'll now create a virtual environment with Python 3.8 and install some basic packages to display 5 years of Apple data with only a few lines of code. Let's create the environment first. In the anaconda prompt enter the following line
conda create -n py3.8 python=3.8
The first part "conda create -n" uses the package manager conda to create a new environment. The second part "py3.8" is the name of the environment, this can be anything you want. If you forget the name of your environments you can use conda env list
at anytime in the terminal to display a list of all the environments you have created. The final part "python=3.8" specifies that we want Python 3.8 to be our Python version for this environment. The prompt will then provide you with a list of what will be installed and downloaded into your environment and ask you if you are happy to proceed. Once complete you type the following into the terminal to activate the environment.
conda activate py3.8
You will notice that the prefix in brackets has changed to display (py3.8)
We can now begin to add our dependencies.
In order to view our stock data we need to install only three libraries: Pandas to analyse and plot our data and Pandas-datareader to obtain our data. Finally Matplotlib which will allow us to plot our data using the Pandas plotting interface. In the prompt type the following:
conda install pandas pandas-datareader matplotlib
We will begin by importing our libraries into our namespace to obtain and analyse our data.
import matplotlib.pyplot as plt
import pandas as pd
import pandas_datareader.data as web
This takes care of the libraries we need to import. Now we can begin to obtain our data. We can use Pandsa DataReader to obtain 5 years of stock data and place it directly into a DataFrame object. The following command will get OHLCV Apple data from Stooq.com. Pandas-Datareader allows you to download data from multiple sources including Quandl, AlphaVantage and IEX. A full list of data sources can be found here.
aapl = web.DataReader("AAPL", "stooq")
We now have five years of Apple data stored as a DataFrame. We can display the first fews rows using the Pandas command aapl.head()
.
Plotting our data is simple using Pandas, just type the following lines:
aapl.plot(y="Adj Close")
plt.show()
Notice that the last line of code uses plt.show()
. This command is making use of the Matplotlib.pyplot library that we imported at the start. It allows us to display the graph directly. The graph of Apple adjusted close price will open in a new window.
And that's it! Using Pandas and Pandas-Datareader you can import multiple stocks, from different data providers. You can perform simple tasks from plotting the close price to building complex strategies all using just three open source Python libraries. The only issue with this approach is that once you exit the Python console you will lose all your work. You can exit the Python console by typing exit()
and then deactivate your virtual enviroment by typing conda deactivate
.
In the next article we will be looking at how to use Jupyter Notebooks to build candlestick plots and moving averages.There is a great conda cheat sheet available here, it's a really useful reference in case you need to quickly check a command.