Scalars, Vectors, Matrices and Tensors - Linear Algebra for Deep Learning (Part 1)

Back in March we ran a content survey and found that many of you were interested in a refresher course for the key mathematical topics needed to understand deep learning and quant finance in general.

Since deep learning is going to be a big part of this year's content we thought it would be worthwhile to write some beginner tutorials on the key mathematical topics—linear algebra, calculus and probability—that are necessary to *really* understand deep learning for quant trading.

This article is the first in the series of posts on the topic of **Linear Algebra for Deep Learning**. It is intended to get you up to scratch in some of the basic ideas and notation that will be found in the more advanced deep learning textbooks and research papers. Reading these papers is *absolutely crucial* to find the **best quantitative trading methods** and as such it helps to speak the language!

Linear algebra is a fundamental topic in the subject of mathematics and is extremely pervasive in the physical sciences. It also forms the backbone of many machine learning algorithms. Hence it is crucial for the deep learning practitioner to understand the core ideas.

Linear algebra is a branch of continuous, rather than discrete mathematics. The mathematician, physicist, engineer and quant will likely be familiar with continuous mathematics through the study of differential equations, which are used to model many physical and financial phenomena.

The computer scientist, software developer or retail discretionary trader however may only have gained exposure to mathematics through subjects such as graph theory or combinatorics—topics found within discrete mathematics. Hence the *set* and *function* notation presented here may be initially unfamiliar.

For this reason the discussion presented in this article series will omit the usual "theorem and proof" approach of an undergraduate mathematics textbook. Instead the focus will be on selected topics that are relevant to deep learning practitioners from diverse backgrounds.

*Please note that the outline of linear algebra presented in this article series closely follows the notation and excellent treatments of Goodfellow et al (2016) ^{[3]}, Blyth and Robertson (2002)^{[1]} and Strang (2016)^{[2]}.*

Linear algebra, probability and calculus are the 'languages' in which machine learning is written. Learning these topics will provide a deeper understanding of the underlying algorithmic mechanics and allow development of new algorithms, which can ultimately be deployed as more sophisticated quantitative trading strategies.

Many supervised machine learning and deep learning algorithms largely entail optimising a loss function by adjusting model parameters. To carry this out requires some notion of how the loss function changes as the parameters of the model are varied.

This immediately motivates calculus—the elementary topic in mathematics which describes changes of quantities with respect to another. In particular it requires the concept of a partial derivative, which specifies how the loss function is altered through individual changes in each parameter.

These partial derivatives are often grouped together—in matrices—to allow more straightforward calculation. Even the most elementary machine learning models such as linear regression are optimised with these linear algebra techniques.

A key topic in linear algebra is that of vector and matrix *notation*. Being able to 'read the language' of linear algebra will open up the ability to understand textbooks, web posts and research papers that contain more complex model descriptions. This will not only allow reproduction and verification of existing models, but will allow extensions and new developments that can subsequently be deployed in trading strategies.

Linear algebra provides the first steps into *vectorisation*, presenting a deeper way of thinking about parallelisation of certain operations. Algorithms written in standard 'for-loop' notation can be reformulated as matrix equations providing significant gains in computational efficiency.

Such methods are used in the major Python libraries such as NumPy, SciPy, Scikit-Learn, Pandas and Tensorflow. GPUs have been designed to carry out optimised linear algebra operations. The explosive growth in deep learning can partially be attributed to the highly parallelised nature of the underlying algorithms on commodity GPU hardware.

Linear algebra is a continuous mathematics subject but ultimately the entities discussed below are implemented in a discrete computational environment. These discrete representations of linear algebra entities can lead to issues of overflow and underflow, which represent the limits of effectively representing extremely large and small numbers computationally.

One mechanism for mitigating the effects of limited numerical presentation is to make use of matrix factorisation techniques. Such techniques allow certain matrices to be represented in terms of simpler, structured matrices that have useful computational properties.

Matrix decomposition techniques include Lower Upper (LU) decomposition, QR decomposition and Singular Value Decomposition (SVD). They are an intrinsic component of certain machine learning algorithms including Linear Least Squares and Pricipal Components Analysis (PCA). Matrix decomposition will be discussed at length later in this series.

*It can not be overemphasised how fundamental linear algebra is to deep learning*. For those that are aiming to deploy the most sophisticated quant models based on deep learning techniques—or are seeking employment at firms that are—it will be necessary to learn linear algebra extremely well.

The material in this article series will cover the bare minimum, but to understand the research frontier it will be necessary to go much further than this. Please see the References at the end of the article for a brief list on where to continue studying linear algebra.

The two primary mathematical entities that are of interest in linear algebra are the **vector** and the **matrix**. They are examples of a more general entity known as a **tensor**. Tensors possess an *order* (or *rank*), which determines the number of dimensions in an array required to represent it.

**Scalars** are single numbers and are an example of a 0th-order tensor. In mathematics it is necessary to describe the set of values to which a scalar belongs. The notation $x \in \mathbb{R}$ states that the (lowercase) scalar value $x$ is an element of (or member of) the set of real-valued numbers, $\mathbb{R}$.

There are various sets of numbers of interest within machine learning. $\mathbb{N}$ represents the set of positive integers ($1, 2, 3,\ldots$). $\mathbb{Z}$ represents the integers, which include positive, negative and zero values. $\mathbb{Q}$ represents the set of *rational* numbers that may be expressed as a fraction of two integers.

**Vectors** are ordered arrays of single numbers and are an example of 1st-order tensor. Vectors are members of objects known as **vector spaces**. A vector space can be thought of as the entire collection of *all* possible vectors of a particular length (or dimension). The three-dimensional real-valued vector space, denoted by $\mathbb{R}^3$ is often used to represent our real-world notion of three-dimensional space mathematically.

More formally a vector space is an $n$-dimensional Cartesian product of a set with itself, along with proper definitions on how to add vectors and multiply them with scalar values. If all of the scalars in a vector are real-valued then the notation $\boldsymbol{x} \in \mathbb{R}^n$ states that the (boldface lowercase) vector value $\boldsymbol{x}$ is a member of the $n$-dimensional vector space of real numbers, $\mathbb{R}^n$.

Sometimes it is necessary to identify the *components* of a vector explicitly. The $i$th scalar element of a vector is written as $x_i$. Notice that this is non-bold lowercase since the element is a scalar. An $n$-dimensional vector itself can be explicitly written using the following notation:

Given that scalars exist to represent values why are vectors necessary? One of the primary use cases for vectors is to represent physical quantities that have both a *magnitude* and a *direction*. Scalars are only capable of representing magnitudes.

For instance scalars and vectors encode the difference between the *speed* of a car and its *velocity*. The velocity contains not only its speed but also its direction of travel. It is not difficult to imagine many more physical quantities that possess similar characteristics such as gravitational and electromagnetic forces or wind velocity.

In machine learning vectors often represent *feature vectors*, with their individual components specifying how important a particular feature is. Such features could include relative importance of words in a text document, the intensity of a set of pixels in a two-dimensional image or historical price values for a cross-section of financial instruments.

**Matrices** are rectangular arrays consisting of numbers and are an example of 2nd-order tensors. If $m$ and $n$ are positive integers, that is $m,n \in \mathbb{N}$ then the $m \times n$ matrix contains $mn$ numbers, with $m$ rows and $n$ columns.

If all of the scalars in a matrix are real-valued then a matrix is denoted with uppercase boldface letters, such as $\boldsymbol{A} \in \mathbb{R}^{m \times n}$. That is the matrix lives in a $m \times n$-dimensional real-valued vector space. Hence matrices are really vectors that are just written in a two-dimensional table-like manner.

Its components are now identified by two indices $i$ and $j$. $i$ represents the index to the matrix row, while $j$ represents the index to the matrix column. Each component of $\boldsymbol{A}$ is identified by $a_{ij}$.

The full $m \times n$ matrix can be written as:

\begin{equation} \boldsymbol{A}=\begin{bmatrix} \kern4pt a_{11} & a_{12} & a_{13} & \ldots & a_{1n} \kern4pt \\ \kern4pt a_{21} & a_{22} & a_{23} & \ldots & a_{2n} \kern4pt \\ \kern4pt a_{31} & a_{32} & a_{33} & \ldots & a_{3n} \kern4pt \\ \kern4pt \vdots & \vdots & \vdots & \ddots & \vdots \kern4pt \\ \kern4pt a_{m1} & a_{m2} & a_{m3} & \ldots & a_{mn} \kern4pt \\ \end{bmatrix} \end{equation}It is often useful to abbreviate the full matrix component display into the following expression:

\begin{equation} \boldsymbol{A} = [a_{ij}]_{m \times n} \end{equation}Where $a_{ij}$ is referred to as the $(i,j)$-element of the matrix $\boldsymbol{A}$. The subscript of $m \times n$ can be dropped if the dimension of the matrix is clear from the context.

Note that a *column vector* is a size $m \times 1$ matrix, since it has $m$ rows and 1 column. Unless otherwise specified all vectors will be considered to be column vectors.

Matrices represent a type of function known as a linear map. Based on rules that will be outlined in subsequent articles, it is possible to define multiplication operations between matrices or between matrices and vectors. Such operations are immensely important across the physical sciences, quantitative finance, computer science and machine learning.

Matrices can encode geometric operations such as rotation, reflection and transformation. Thus if a collection of vectors represents the vertices of a three-dimensional geometric model in Computer Aided Design software then multiplying these vectors individually by a pre-defined rotation matrix will output new vectors that represent the locations of the rotated vertices. This is the basis of modern 3D computer graphics.

In deep learning neural network weights are stored as matrices, while feature inputs are stored as vectors. Formulating the problem in terms of linear algebra allows compact handling of these computations. By casting the problem in terms of tensors and utilising the machinery of linear algebra, rapid training times on modern GPU hardware can be obtained.

The more general entity of a tensor encapsulates the scalar, vector and the matrix. It is sometimes necessary—both in the physical sciences and machine learning—to make use of tensors with order that exceeds two.

In theoretical physics, and general relativity in particular, the Riemann curvature tensor is a 4th-order tensor that describes the local curvature of spacetime. In machine learning, and deep learning in particular, a 3rd-order tensor can be used to describe the intensity values of multiple channels (red, green and blue) from a two-dimensional image.

Tensors will be identified in this series of posts via the boldface sans-serif notation, $\textsf{A}$. For a 3rd-order tensor elements will be given by $a_{ijk}$, whereas for a 4th-order tensor elements will be given by $a_{ijkl}$.

In the next article the basic operations of matrix-vector and matrix-matrix multiplication will be outlined. This topic is collectively known as **matrix algebra**.

- Scalars, Vectors, Matrices and Tensors - Linear Algebra for Deep Learning (Part 1)
- Matrix Algebra - Linear Algebra for Deep Learning (Part 2)

- [1] Blyth, T.S. and Robertson, E.F. (2002)
*Basic Linear Algebra, 2nd Ed.*, Springer - [2] Strang, G. (2016)
*Introduction to Linear Algebra, 5th Ed.*, Wellesley-Cambridge Press - [3] Goodfellow, I.J., Bengio, Y., Courville, A. (2016)
*Deep Learning*, MIT Press