Installing TensorFlow on Ubuntu 16.04 with an Nvidia GPU

Installing TensorFlow on Ubuntu 16.04 with an Nvidia GPU

Any serious quant trading research with machine learning models necessitates the use of a framework that abstracts away the model implementation from the model specification.

This is particularly crucial for deep learning techniques as production-grade models require training on GPUs to make them computationally tractable. However, direct programming of GPUs requires knowledge of proprietary languages like Nvidia CUDA or abstraction layers such as OpenCL. Either way, experience with C, C++ or Fortran is a must.

Hence a framework that removes the low-level implementation details of execution, while providing a high-level API for straightforward model specification—without sacrificing execution accuracy or the ability to scale computation—is very attractive to quant researchers. TensorFlow is such a framework.

However it has a reputation for being difficult to install. Up until recently this reputation was warranted. Indeed it can still be challenging to get working on certain systems.

There are many ways to install TensorFlow, such as making use of a ready-made machine image for a cloud server. An example is Amazon's Deep Learning AMI, which comes preinstalled with all necessary dependencies and deep learning software. It can be accessed remotely at a competitive hourly rate.

However, this article describes the installation procedure for TensorFlow on a modern Linux desktop system with an affordable, up-to-date consumer-grade GPU, such as those found within Nvidia's GeForce series.

We will begin by outlining the advantages of the TensorFlow library along with a few words of caution on the potential difficulty of its intallation. We will then consider an optimal choice for operating system and install the necessary Python research environment. The discussion will then turn towards installing TensorFlow against both a CPU and a GPU. We will also take a look at the common problems that can occur and how to troubleshoot them.

Recently I discussed the advantages and disadvantages of using a desktop deep learning research system versus renting one in the cloud.

Why TensorFlow?

The focus of this article is not on why framework X is superior to framework Y. The intent is simply to describe the installation of TensorFlow, which is emerging as one of the strongest contenders for deep learning model implementation.

It has been chosen for all subsequent deep learning articles on QuantStart for the following pragmatic reasons:

  • Popularity - With popularity comes a large community and thus more likelihood of solving errors when they crop up, as well as a larger base of tutorials and textbooks from which to learn.
  • Python - TensorFlow is a Python library and so it can easily talk to all of the other quantitative finance libraries discussed on QuantStart such as NumPy, Pandas and Scikit-Learn.
  • Google - TensorFlow is a Google product, albeit an open-source one. It is used in their production systems and for leading AI research, as carried out by some of their sub-teams including DeepMind. Hence it has a strong pedigree. It also means that they will be strongly motivated to continually improve the software as they are "eating their own dog food".
  • Ease of Use - Despite the initial learning curve TensorFlow is actually quite straightforward to use, particularly with the newer releases. Hence more time can be spent developing quant models rather than fighting with a framework.

It will become clear in subsequent articles why TensorFlow is such a useful library for quant trading research so please bear with me!

A Few Words Of Caution

Deep learning is a rapidly moving field on the cusp of the research frontier. It has significant potential for quantitative trading models, much of which we will be exploring in subsequent artices.

However it also exists on the bleeding edge. Much of the research carried out is heuristic and experimental in nature with limited theoretical guarantees. It takes advantage of the latest computational technology and open-source frameworks to produce state-of-the-art results.

Accordingly, this can make it extremely challenging to know what the "best" community, framework, programming language or operating system is to use with it. It also admits many implementation details that can significantly interfere with research time.

Hence I would like to issue a warning that while deep learning research is very exciting it can also be extremely frustrating. As quant traders we wish to spend as much time as possible researching new strategies, risk layers or portfolio construction methodologies. We do not wish to be fighting with graphics card drivers or package dependency issues. However, this is the reality of deep learning. Be warned that sophisticated techniques like deep learning require us to "get our hands dirty" with these implementation details.

I would also like to add that due to the speed of iteration in the field much of the advice I give below is likely to be out of date in six months time! Of course I will try my best to keep these articles up to date, but please be aware that as the field consolidates new best practices will emerge and they will supersede the techniques mentioned here.

Operating System

Recently I made the case that if you want to carry out serious deep learning work it will be necessary to use Linux (and, more specifically, Ubuntu Linux) as your research environment operating system.

While Windows and Mac OS X are perfectly acceptable systems for carrying out TensorFlow work on a CPU, they fall down significantly when it comes to using a GPU.

Unfortunately as of the date of writing this article the deep learning research ecosystem is insufficiently mature to recommend heavy TensorFlow development and deep learning training on Windows (and to some extent Mac OS X).

Hence the rest of this article will assume that you have the latest stable release of Ubuntu Linux installed—namely 16.04—and want to install TensorFlow for an Nvidia CUDA-compatible GPU.

Note: With virtualisation and containerisation technologies like VirtualBox, Vagrant and Docker, as well as the prevalence of vendor-specific cloud-based deep learning instances, this is less of a concern than it used to be.

Python Prerequisites

Before installing TensorFlow—CPU or GPU—you will need to have a functioning Python virtual environment in which to run TensorFlow. My consistent recommendation for newcomers is to download the latest Anaconda distribution, which as of the writing date of this article is for Python 3.6.

Once you've downloaded and installed Anaconda it will be necessary to create a separate virtual environment to isolate your TensorFlow install, which in this instance I have named tensorflow. To do this simply type:

$ conda create -n tensorflow python=3.6

Of course you can also set up a dedicated non-Anaconda Python virtual environment, although if you're considering this then I am going to assume you know your way around virtualenv and thus won't go into any more details here.

Installing TensorFlow for CPU Use

Prior to outlining the details for the GPU-specific installation it is worth noting that it is possible to install TensorFlow to work solely against the CPU. While training will be far slower than on a GPU, it will still be possible. Such an installation is useful for self-teaching and trying out simpler models with fewer data.

To begin the installation of TensorFlow simply activate your Anaconda virtual environment:

$ source activate tensorflow

Your terminal prompt will now change to the following:

(tensorflow)$

Assuming that you created the Anaconda environment for Python 3.6 as above, and have a 64-bit operating system (which Ubuntu 16.04 is an example of) then the following pip command will install the correct CPU binary TensorFlow 1.4 package into your virtual environment:

(tensorflow)$ pip install --ignore-installed --upgrade \
 https://storage.googleapis.com/tensorflow/linux/cpu/tensorflow-1.4.0-cp36-cp36m-linux_x86_64.whl

This will install TensorFlow and the necessary dependencies. Once this has finished you can skip ahead to the TensorFlow Installation Validation section below.

Installing TensorFlow for GPU Use

Installing TensorFlow against an Nvidia GPU on Linux can be challenging. I have personally carried out this procedure multiple times with many different Nvidia GPUs, ranging from a pair of older GeForce GTX 780 Ti cards through to a modern GeForce GTX 1080 Ti.

In each instance obscure problems occurred and had to be dealt with. I'll mention a few of these in the Troubleshooting section below.

The outline of the process is as follows:

  1. Install necessary operating system dependency packages to obtain a functional Python virtual environment (or use Anaconda)
  2. Install the appropriate Nvidia driver for your particular GPU
  3. Install the latest Nvidia CUDA library supported by your card and planned TensorFlow version
  4. Install the cuDNN library appropriate for your version of CUDA and planned TensorFlow version
  5. Install the correct binary GPU TensorFlow package that is compiled against your specific CUDA version and cuDNN version

Secure UEFI Boot

A consistent issue that I've come across when helping anybody install the Nvidia drivers occurs with a motherboard setting known as Secure UEFI Boot. This needs to be be disabled on most motherboards in order to allow the Nvidia drivers to be installed without problems.

Unfortunately the process is very motherboard specific, but broadly it will be necessary to enter the BIOS on boot-up of your machine, usually via the Del or F10 key, and then navigate to the section that determines boot settings. Sometimes this can involve backing up and removing certain keys while in other instances it is simply a boolean setting that is easily modified.

If you don't disable this feature then you are likely to run into trouble at the point when you log in to Ubuntu, as the Nvidia graphics drivers will probably load incorrectly.

The Procedure

Before proceeding with the installation make sure to disable Secure UEFI Boot as described above. Please be aware that this is an advanced setting on your motherboard and if you're not sure what you're doing, please ask an individual with more experience. Also, make sure to back up any files on your system before carrying out this procedure!

The first step is to add the package repository for the Nvidia graphics drivers that we're going to install, so that they can be picked up by Ubuntu. Open up a Terminal window and type the following:

$ sudo add-apt-repository ppa:graphics-drivers/ppa

The next step is to install some essential compilation tools such as the GNU Compiler Collection (GCC), which allows us to compile other necessary libraries from source:

$ sudo apt-get update
$ sudo apt-get install build-essential

If you are not intending to make use of the Anaconda distribution you will need to install the Python 3 packages as well:

$ sudo apt-get install python-dev python3-dev python-virtualenv

At this stage it is necessary to download and install the appropriate Nvidia driver. The procedure is largely the same for all GeForce cards up to the 1080 Ti, which is one of the most popular consumer-grade video cards for deep learning.

My recommendation—as of the writing date of this article—is to use the 387.34 version of the drivers, which are packaged into the nvidia-387 binary. To install this package along with a few other necessary libraries please type the following:

$ sudo apt-get install nvidia-387
$ sudo apt-get install mesa-common-dev
$ sudo apt-get install freeglut3-dev

Note: If you wish to read about the various other drivers for older, or non consumer-grade cards then take a look at this link.

Once this is complete you need to reboot your machine and log back in to your Ubuntu instance. Assuming all went well you should see your Unity desktop as before.

The next step is to install Nvidia CUDA. The current version built against TensorFlow 1.4 is version 8.0. In particular, the current version of CUDA is actually version 9.0, but TensorFlow currently only supports 8.0! To obtain older versions visit the Cuda Toolkit Archive (you may be asked to sign-up for an Nvidia CUDA account) and download the CUDA 8.0 GA2 (Feb 2017) runfile. The correct link for the necessary runfile is found here (be warned it is 1.4Gb in size!):

Once the file is downloaded, change to the directory where it is located and execute it with the following command:

$ cd ~/Downloads
$ sudo sh cuda_8.0.61_375.26_linux-run

You will be asked a series of y/n questions. It is crucial that you do NOT install the CUDA driver when asked, otherwise it will overwrite the 387 driver we installed above. So select n when this question arises:

Install NVIDIA Accelerated Graphics Driver for Linux-x86_64 ...?
(y)es/(n)o/(q)uit: n

The remainder of the questions ask whether you want to install the CUDA 8.0 Toolkit (yes), the Toolkit Location (keep the default), a symbolic link (yes) and whether to install the Samples (yes - it helps for testing, put them in the default location asked).

You will also see a warning similar to the following:

***WARNING: Incomplete installation! This installation did not install the CUDA Driver...

Do not be alarmed as that is simply the script telling us that no driver was installed by the script. We have already installed the necessary Nvidia driver above.

In order for the CUDA commands to be added to the system path it is necessary to modify some environment variables. The simplest way to achieve this is to open up the hidden .bashrc file found in your home directory using your favourite text editor. I prefer emacs when working in the terminal, but you can also use vi/vim, nano, gedit or SublimeText. Edit the file ~/.bashrc and add the following two lines to the end of the file:

export PATH=/usr/local/cuda-8.0/bin${PATH:+:${PATH}}
export LD_LIBRARY_PATH=/usr/local/cuda-8.0/lib64${LD_LIBRARY_PATH:+:${LD_LIBRARY_PATH}}

You will then need to re-source this file or close and re-open the terminal in order for the environment variables to be set. To check that the installation was successful we can run the nvidia-smi tool. As an example, on the system I am writing this article on I have two older GeForce GTX 780 Ti cards installed against the Nvidia 384 driver:

$ nvidia-smi

Thu Nov 30 14:48:03 2017       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 384.98                 Driver Version: 384.98                    |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  GeForce GTX 780 Ti  Off  | 00000000:01:00.0 N/A |                  N/A |
| 17%   36C    P8    N/A /  N/A |    486MiB /  3017MiB |     N/A      Default |
+-------------------------------+----------------------+----------------------+
|   1  GeForce GTX 780 Ti  Off  | 00000000:02:00.0 N/A |                  N/A |
| 17%   32C    P8    N/A /  N/A |      1MiB /  3020MiB |     N/A      Default |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID   Type   Process name                             Usage      |
|=============================================================================|
|    0                    Not Supported                                       |
|    1                    Not Supported                                       |
+-----------------------------------------------------------------------------+

If you have a similar output appropriate for your card then you have successfully installed CUDA 8.0 on your system.

The next stage is to install the CUDA Deep Neural Network library, known as cuDNN. You can download various versions of cuDNN at this location. You will need to select cuDNN v6.0 (April 27, 2017), for CUDA 8.0. The particular tar-gzipped file you need is linked to below:

Once you have downloaded the file you need to unzip it, untar it, copy the library files to the appropriate CUDA directory and then grant them access permissions. To do this type the following commands (assuming that you've placed your file in the ~/Downloads directory):

$ cd ~/Downloads
$ tar -zxvf cudnn-8.0-linux-x64-v6.0-ga.tgz
$ sudo cp cuda/include/cudnn.h /usr/local/cuda/include/
$ sudo cp cuda/lib64/libcudnn* /usr/local/cuda/lib64/
$ sudo chmod a+r /usr/local/cuda/include/cudnn.h
$ sudo chmod a+r /usr/local/cuda/lib64/libcudnn*

CUDA 8.0 and cuDNN 6.0 are now installed. All that remains is to install TensorFlow 1.4.

If you are using Anaconda make sure to activate your virtual environment as described above. Then type the following to install TensorFlow for the appropriate Python version, which should be 3.6 if you've selected the latest version:

(tensorflow)$ pip install --ignore-installed --upgrade https://storage.googleapis.com/tensorflow/linux/gpu/tensorflow_gpu-1.4.0-cp36-cp36m-linux_x86_64.whl

At this stage TensorFlow should be installed! The final step is to validate that the install works as expected.

TensorFlow Installation Validation

Whether you've installed TensorFlow against your CPU or GPU(s) you can follow these steps to check that your installation is functional. Firstly open up an interactive Python console:

(tensorflow)$ python

Then import TensorFlow and try to run a Session (we won't explain what a session is in this article, but we will in the next one):

>>> import tensorflow as tf
>>> hello = tf.constant('Hello, World!')
>>> sess = tf.Session()
>>> print(sess.run(hello))

You should see the following output if all goes well. Note also that it is common for TensorFlow to include a lot of diagnostic/warning information upon import. This is usually to let you know that TensorFlow was not compiled against certain processor instructions that can allow it to be sped up. As long as you receive the following output at the end of these diagnostic messages your installation should be okay:

Hello, World!

Of course it is common to experience errors along the way as every user's system is different. In addition you may be using a different GeForce (or other) GPU and possibly an older version of Ubuntu. Hence there is a lot that can go wrong. For that reason I've created a Troubleshooting section below.

Troubleshooting

Here are some common issues:

  • Linux won't boot into desktop once graphics drivers are installed - Almost certainly this is due to the fact you didn't disable Secure UEFI Boot in your motherboard. You will need to disable this setting, uninstall the Nvidia drivers and then reinstall them. This will require you to use the built-in terminals which can be accessed by pressing Ctrl+Alt+F1 at the login screen. You will need to run the commands to uninstall the Nvidia drivers and reinstall the native drivers that come with Ubuntu first.
  • Nvidia-smi cannot be found - It is likely you didn't add the environment variables to your .bashrc file—or you didn't re-source the file or re-open the Terminal.

Other errors can occur because you possibly downloaded the incorrect version of the Nvidia drivers (make sure to use 387 or 384), CUDA version (make sure to use 8.0) or cuDNN version (make sure to use 6.0) or TensorFlow GPU version (make sure to use the TensorFlow 1.4 binary, built against Python 3.6, CUDA 8.0 and cuDNN 6.0).

If you have any other issues please add them to the Disqus comments below. Another way to solve any issues that come up is to search for a Stack Overflow post related to your problem, as it is likely many others have had the issue before! Simply type your error message into Google and you will likely find many instances of the same problem (and hopefully a solution!).

Next Steps

This article has only briefly provided answers to why we want to use TensorFlow and in fact what it is. In the next article I will describe how TensorFlow works and provide a tutorial on how to begin using it.

In subsequent articles we will actually begin developing some deep neural network architectures with various training mechanisms utilising the high-level TensorFlow API.

Useful Links