Some years ago we wrote a range of articles on random number generation (RNG) using C++. These techniques are primarily used for Monte Carlo simulations that underpin modern derivatives pricing methods. The articles included one that implemented a particular algorithm known as a Linear Congruential Generator (LCG). The LCG is an algorithm for generating random looking numbers, despite being completely deterministic in nature. It belongs to a wider class of algorithms known as pseudo-random number generators (PRNG).
Since we wrote those articles, Python has come to dominate much of the quant finance data analysis landscape, due to its ease of programming and availability of numerical libraries. Hence we have decided to re-introduce the Linear Congruential Generator but implement the algorithm in Python. Furthermore, we can use Python's third-party plotting and data analysis tools to plot some interesting visualisations and determine some statistical properties of the generated random sequences.
What is a Linear Congruential Generator (LCG)?
As mentioned above, LCGs are one of the most widely used methods for generating pseudo-random sequences of numbers. This is because they are simple to understand and implement as well as fast to execute. This makes them popular in a variety of computational fields, including quantitative finance and derivatives pricing.
The LCG works by using a mathematical formula to produce a sequence of numbers that appear random. However, these numbers are determined algorithmically and are therefore not truly random; they are known as "pseudo-random".
In quantitative finance, random number generation is essential for risk modeling and derivatives pricing. For instance, Monte Carlo simulations, a common technique in finance for estimating the value of complex derivatives, rely on large sequences of random numbers to simulate the behavior of financial markets.
The quality and speed of these simulations depends heavily on the underlying random number generator. While LCGs are often considered basic compared to more advanced pseudo-random number generators, their ease of use and computational efficiency make them a viable option for many quantitative finance applications, particularly when computational resources are limited.
Despite their widespread usage, it is important to recognize that LCGs have limitations. They are sensitive to the choice of parameters and if not configured properly can produce sequences with poor statistical properties such as short periods (how quickly they repeat themselves) or predictable patterns. Below, we will explore these limitations in more detail and in future articles we will discuss alternative PRNGs that can offer improved performance in financial applications.
Mathematical Overview of Linear Congruential Generators
The fundamental equation that drives the Linear Congruential Generator (LCG) is a recurrence relation used to generate a sequence of pseudo-random numbers. The equation is as follows:
\begin{eqnarray} X_{n+1} = (aX_n + c) \mod m \end{eqnarray}Where:
- \(X_{n+1}\): The next number in the sequence.
- \(X_n\): The current number in the sequence.
- \(a\): The multiplier, a constant that determines the rate at which the sequence progresses.
- \(c\): The increment, a constant added to adjust the sequence.
- \(m\): The modulus, a positive integer that specifies the range of values for the sequence.
The LCG formula takes the current value \(X_n\), applies the multiplier \(a\), adds the increment \(c\), and then applies the modulus \(m\) operation to produce the next value in the sequence, \(X_{n+1}\). The choice of these parameters is crucial as it affects the statistical properties, period, and behavior of the sequence.
The sequence begins with an initial value known as the seed, denoted as \(X_0\). The seed acts as the starting point for the LCG, and if the same seed is used across multiple runs, the LCG will produce the same sequence, which is essential for reproducibility in simulations. However, it also highlights that LCGs are pseudo-random rather than truly random, as the sequence is entirely deterministic based on the chosen parameters.
Key Parameters: Multiplier, Increment, Modulus, and Seed
Each parameter in the LCG equation plays a specific role in shaping the sequence of numbers. Understanding these components is key to configuring an effective LCG:
- Multiplier (\(a\)): The multiplier must be chosen carefully to ensure that the sequence has a long period and good randomness properties. Commonly used values are based on research into the behavior of various multipliers, such as \(a = 1664525\) used in older LCGs like the one in the C programming language.
- Increment (\(c\)): The increment is added to ensure that the sequence does not produce a series of numbers that are too predictable. Setting \(c = 0\) converts the LCG into a multiplicative congruential generator, which has certain restrictions and may limit the range and quality of the generated sequence. For general-purpose LCGs, \(c\) is usually set to a non-zero value.
- Modulus (\(m\)): The modulus determines the range of the generated sequence. In most cases, it is chosen as a large prime number or a power of two (e.g., \(2^{32}\)) for efficiency in computation. The choice of \(m\) greatly affects the periodicity of the LCG; larger values of \(m\) allow for a longer sequence before the numbers start repeating.
- Seed (\(X_0\)): The initial value, or seed, used to initialise the sequence. Different seeds produce different sequences, making it a convenient tool for creating various random number streams. It’s also important for reproducibility: by fixing the seed value, you can recreate the exact same sequence, which is useful in testing and debugging quantitative models.
The parameters \(a\), \(c\), and \(m\) must be chosen according to specific guidelines to ensure the LCG has a full period (meaning it cycles through all possible values before repeating). For a full period, three conditions, known as the Hull-Dobell Theorem, must be satisfied:
- \(c\) and \(m\) are relatively prime (i.e., their greatest common divisor is 1).
- \((a - 1)\) is divisible by all prime factors of \(m\).
- \((a - 1)\) is divisible by 4 if \(m\) is a multiple of 4.
These conditions ensure that the LCG can produce a maximum-length sequence of numbers before repeating.
Periodicity and Statistical Properties of LCGs
Periodicity refers to the length of the sequence that an LCG can generate before it begins to repeat itself. For an LCG with parameters carefully selected to satisfy the conditions mentioned earlier, the maximum period is \(m\). This means that, in the best-case scenario, the LCG will cycle through \(m\) different numbers before repeating the sequence.
However, it is important to note that even with a maximum period, the quality of the sequence is not guaranteed. An LCG's statistical properties, such as uniformity and independence, are crucial for its effectiveness in applications like Monte Carlo simulations or risk modeling. Poor parameter choices can lead to issues such as:
- Short Period: If the parameters do not meet the necessary conditions, the sequence may have a period much shorter than \(m\), limiting its usefulness in long simulations.
- Correlated Outputs: A poorly designed LCG can produce sequences where numbers are correlated, reducing their suitability for modeling real-world stochastic processes.
- Non-Uniform Distribution: If the modulus or multiplier is not chosen carefully, the LCG may generate sequences that cluster around certain values rather than being uniformly distributed across the entire range.
To assess the quality of an LCG, statistical tests like the Kolmogorov-Smirnov test and autocorrelation tests are commonly employed. These tests help evaluate whether the generated numbers exhibit the desired properties of a random sequence. To highlight these issues we will implement a basic LCG in Python below and visualise its output to assess its uniformity and independence.
Implementing Linear Congruential Generators in Python
For the purposes of this article we will assume that you have a ready-to-go Python data science virtual environment or similar that contains the NumPy, Pandas and Matplotlib libraries installed. A typical way to achieve this is to install the Conda distribution.
Recalling the LCG recurrence relation from above, with the multiplier (\(a\)), increment (\(c\)), modulus (\(m\)), and seed (\(X_0\)) parameters, we can implement the LCG straightforwardly as follows:
import numpy as np
def lcg(seed, a, c, m, size):
"""
Linear Congruential Generator (LCG) function.
Parameters
----------
seed : `int`
The initial value (X0) for the LCG.
a : `int`
The multiplier.
c : `int`
The increment.
m : `int`
The modulus.
size : `int`
The number of random numbers to generate.
Returns
-------
`np.ndarray[float]`
A NumPy array containing the generated sequence.
"""
# Initialize the sequence array with zeros
sequence = np.zeros(size)
# Set the initial value
sequence[0] = seed
# Generate the sequence
for i in range(1, size):
sequence[i] = (a * sequence[i - 1] + c) % m
return sequence
In this function:
- seed: The initial value that starts the sequence.
- a: The multiplier value used in the LCG formula.
- c: The increment value added to each iteration.
- m: The modulus value, which determines the range of the sequence.
- size: The number of random numbers to generate in the sequence.
The function initializes a NumPy array, sequence
, to store the generated values. It sets the first value of the array as the seed and then iterates through the range of the desired sequence size to calculate each subsequent value using the LCG formula.
Statistical Analysis and Visualisation of the Sequence
After implementing the LCG function, it is essential to analyse and visualise the generated numbers to evaluate their statistical properties. We will use the third-party Matplotlib library to plot the sequence and determine whether the generated numbers appear uniformly distributed. Additionally, we will create a scatter plot to check for independence between successive values.
First, let’s generate a sequence using the LCG function using a collection of specified parameters:
# Parameters for the LCG
seed = 42
a = 1664525
c = 1013904223
m = 2**32
size = 1000
# Generate the sequence
random_sequence = lcg(seed, a, c, m, size)
Now, let’s plot a histogram of the generated sequence to observe the distribution:
import matplotlib.pyplot as plt
# Plotting the histogram
plt.hist(random_sequence, bins=50, edgecolor='black', alpha=0.7)
plt.title("Histogram of LCG Generated Numbers")
plt.xlabel("Value")
plt.ylabel("Frequency")
plt.show()
The output is given in the following figure:
The histogram provides a visual representation of how the generated values are distributed. A well-configured LCG should show a fairly uniform distribution across the range of values defined by the modulus \(m\). This uniformity is crucial for applications such as Monte Carlo simulations, where the quality of the pseudo-random numbers can impact the accuracy of the results.
To further evaluate the independence of the generated values, we can create a scatter plot that plots each value against its successor:
# Scatter plot of successive values
plt.scatter(random_sequence[:-1], random_sequence[1:], alpha=0.5)
plt.title("Scatter Plot of Successive LCG Values")
plt.xlabel("X_n")
plt.ylabel("X_n+1")
plt.show()
The output is given in the following figure:
In the scatter plot, each point represents a pair of successive values (\(X_n\), \(X_{n+1}\)). Ideally, the points should be spread evenly without forming any discernible pattern. If the points appear to form clusters or lines, it may indicate a correlation between successive values, suggesting that the LCG parameters need adjustment for better randomness.
It is also possible to formally assess the autocorrelation of the sequence using autocorrelation plots, which we have covered previously in QuantStart in the following article on Serial Correlation. In that article and the associated series on Time Series Analysis we utilised the R statistical language which has autocorrelation plots built-in. For Python, we can either use the Statsmodels library or use a method from Pandas, namely the pandas.plotting.autocorrelation_plot method.
Below is an example of how to perform a basic autocorrelation test using Pandas and Matplotlib:
import pandas as pd
# Convert the sequence to a Pandas Series
series = pd.Series(random_sequence)
# Plot the autocorrelation of the sequence
pd.plotting.autocorrelation_plot(series)
plt.title("Autocorrelation of LCG Generated Sequence")
plt.show()
The output is given in the following figure:
In this plot, if the values remain close to zero for all lags (except lag 0), it indicates minimal autocorrelation and suggests that the LCG is producing independent pseudo-random numbers.
In practice, if the LCG fails any of these tests or shows signs of periodicity or correlation, adjusting the parameters (\(a\), \(c\), \(m\), and seed) is necessary. For financial simulations where accuracy is paramount, alternative PRNGs (pseudo-random number generators) like the Mersenne Twister or Xorshift may also be considered.
Summary and Next Steps
In this article we have re-introduced the Linear Congruential Generator and carried out an implementation in Python. We have also discussed various tests to ensure the (deterministic) sequence of generated numbers is sufficiently random for the purposes of quantitative finance simulation.
In future articles we are going to consider more sophisticated PRNGs such as the Mersenne Twister and Xorshift. The former actually forms the basis of Python's random
number generation, which will allow us to compare our implementation to the one provided in the Python standard library.
Full Code
# lcg.py
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
def lcg(seed, a, c, m, size):
"""
Linear Congruential Generator (LCG) function.
Parameters
----------
seed : `int`
The initial value (X0) for the LCG.
a : `int`
The multiplier.
c : `int`
The increment.
m : `int`
The modulus.
size : `int`
The number of random numbers to generate.
Returns
-------
`np.ndarray[float]`
A NumPy array containing the generated sequence.
"""
# Initialize the sequence array with zeros
sequence = np.zeros(size)
# Set the initial value
sequence[0] = seed
# Generate the sequence
for i in range(1, size):
sequence[i] = (a * sequence[i - 1] + c) % m
return sequence
if __name__ == "__main__":
# Parameters for the LCG
seed = 42
a = 1664525
c = 1013904223
m = 2**32
size = 1000
# Generate the sequence
random_sequence = lcg(seed, a, c, m, size)
# Plotting the histogram
plt.hist(random_sequence, bins=50, edgecolor='black', alpha=0.7)
plt.title("Histogram of LCG Generated Numbers")
plt.xlabel("Value")
plt.ylabel("Frequency")
plt.show()
# Scatter plot of successive values
plt.scatter(random_sequence[:-1], random_sequence[1:], alpha=0.5)
plt.title("Scatter Plot of Successive LCG Values")
plt.xlabel("X_n")
plt.ylabel("X_n+1")
plt.show()
# Convert the sequence to a Pandas Series
series = pd.Series(random_sequence)
# Plot the autocorrelation of the sequence
pd.plotting.autocorrelation_plot(series)
plt.title("Autocorrelation of LCG Generated Sequence")
plt.show()