Recently on QuantStart we have begun looking at generating synthetic asset price paths using Stochastic Differential Equation models such as the Brownian Motion, Geometric Brownian Motion (GBM), Ornstein-Uhlenbeck and Vasicek Models. Historically, we have also considered more sophisticated models such as the Heston Stochastic Volatility Model. What we have not considered to any great extent in previous posts is how to generate multiple price paths simultaneously that are all
In real market settings, such as in equities markets, a complex correlation structure exists across assets, somewhat reflective of the classification of stocks into sectors and subsectors. This post will provide a complete example of how to generate a random, structured correlation matrix, which will then be used to generate a collection of correlated GBM paths. Such synthetic paths have a number of uses within quantitative finance, which we will now elaborate on.
Introduction
Synthetic equity data generation has become an indispensable tool in quantitative finance, serving multiple critical purposes across research, development, and production environments. The ability to generate realistic, correlated price paths allows quant finance practitioners to stress-test portfolios, validate backtesting frameworks, and train machine learning models without relying solely on limited historical data.
One of the primary challenges in financial modeling is the scarcity of extreme market events in historical data. While we may have decades of daily returns, true market crises are rare, making it difficult to assess how strategies perform under extreme conditions. Synthetic data generation enables us to create thousands of plausible market scenarios, including rare "black swan" events, providing a more comprehensive view of potential risks.
Furthermore, debugging quantitative trading systems requires controlled test cases with known properties. When a backtesting framework produces unexpected results, it's invaluable to input synthetic data with precisely defined statistical properties to isolate whether issues stem from the strategy logic, the data pipeline, or the execution simulator.
For machine learning applications, synthetic data serves as an excellent pre-training dataset. Models can learn general market dynamics from unlimited synthetic samples before fine-tuning on limited real market data.
In the following sections we are going to use open-source Python data science libraries, such as NumPy, SciPy, Matplotlib and Seaborn to generate these random matrices, the correlated price paths and associated visualisation plots. Firstly, however we will discuss the mathematical theory of the generation.
Mathematical Theory
The mathematical theory of the approach will be broken down into two subsections. The first discusses the mathematics around generating structured, random correlated matrices of the appropriate type. The second discusses how to generate correlated GBM paths using these correlation matrices.
Generating Realistic Correlation Matrices
Real equity markets exhibit complex correlation structures that cannot be captured by simple random matrices. Stocks tend to cluster into sectors (technology, financials, utilities, etc.), with higher correlations within sectors and lower—sometimes negative—correlations between certain sector pairs. Our approach models this structure explicitly.
We construct the correlation matrix \(\Sigma\) by first assigning each of \(N\) assets to one of \(K\) sectors. For any pair of assets \(i\) and \(j\), the correlation \(\rho_{ij}\) is determined by:
\[ \rho_{ij} = \begin{cases} \mathcal{U}(\rho_{\text{min_intra}}, \rho_{\text{max_intra}}) + \epsilon_{ij} & \text{if } s_i = s_j \\ \mathcal{U}(\rho_{\text{min_neg}}, \rho_{\text{max_neg}}) + \epsilon_{ij} & \text{if } (s_i, s_j) \in \mathcal{N} \\ \mathcal{U}(\rho_{\text{min_inter}}, \rho_{\text{max_inter}}) + \epsilon_{ij} & \text{otherwise} \end{cases} \]
Where:
- \(s_i\) denotes the sector assignment of asset \(i\)
- \(\mathcal{U}(a,b)\) represents a uniform distribution on \([a,b]\)
- \(\epsilon_{ij} \sim \mathcal{N}(0, \sigma_{\text{noise}}^2)\) adds realistic noise
- \(\mathcal{N}\) is the set of sector pairs with negative correlation
To ensure the resulting matrix is positive semi-definite (a requirement for valid correlation matrices), we perform eigenvalue decomposition and adjust any negative eigenvalues:
\[ \Sigma = V \Lambda V^T \quad \rightarrow \quad \tilde{\Sigma} = V \max(\Lambda, \epsilon I) V^T \]
Finally, we normalize to ensure unit diagonal elements: \(\Sigma_{ij} = \tilde{\Sigma}_{ij} / \sqrt{\tilde{\Sigma}_{ii} \tilde{\Sigma}_{jj}}\)
Simulating Correlated Geometric Brownian Motion
Asset prices are modeled using Geometric Brownian Motion (GBM), where the price \(S_i(t)\) of asset \(i\) follows:
\[ dS_i(t) = \mu_i S_i(t) dt + \sigma_i S_i(t) dW_i(t) \]
Where \(\mu_i\) is the drift, \(\sigma_i\) is the volatility, and \(dW_i(t)\) are correlated Brownian increments with \(\text{Cov}(dW_i, dW_j) = \rho_{ij} dt\).
Using the Euler-Maruyama discretisation with time step \(\Delta t\):
\[ S_i(t + \Delta t) = S_i(t) \exp\left[\left(\mu_i - \frac{\sigma_i^2}{2}\right)\Delta t + \sigma_i \sqrt{\Delta t} Z_i(t)\right] \]
The key challenge is generating correlated normal random variables \(Z_i(t)\) that respect our correlation matrix \(\Sigma\). This is achieved through Cholesky decomposition: \(\Sigma = LL^T\), where \(L\) is lower triangular.
Given independent standard normal variables \(\tilde{Z} \sim \mathcal{N}(0, I)\), we obtain correlated variables via:
\[ Z = L\tilde{Z} \quad \Rightarrow \quad \text{Cov}(Z) = L \text{Cov}(\tilde{Z}) L^T = LL^T = \Sigma \]
Python Implementation
In this section we will describe the Python implementation for the correlation matrix generation, the correlated GBM path generation and the visualisation functions. This will be concluded by describing the if __name__ == "__main__":
entrypoint.
The first task is to import the relevant libraries and classes. This includes Matplotlib, NumPy, Scipy and Seaborn:
import matplotlib.pyplot as plt
from matplotlib.colors import LinearSegmentedColormap
import numpy as np
from scipy.linalg import block_diag
import seaborn as sns
Once the libraries are imported the next step is to create a function to generate realistic block structures with controllable inter- and intra-sector correlations. The generate_realistic_correlation_matrix
function takes in the size of the matrix n_sectors
, the inter- and intra-sector correlation ranges, the noise level of the matrix and optionally which sector pairs should have negative correlations.
The function begins by assigning assets to sectors, calculating the sector sizes and creating the sector assignment list. It then generates an identity matrix as the initial basis for the correlation matrix. The bulk of the function iterates over each row and column of the upper triangular portion of the matrix, sampling from uniform distributions for each element, with ranges for each distribution determined by whether assets are in the same sector or not. Noise is then added to the matrix values, which are all clipped to approximately \([-1, 1]\). Finally, the eigenvalues of the correlation matrix are adjusted to ensure positive semi-definiteness and normalised to ensure diagonals are unity.
def generate_realistic_correlation_matrix(
n,
n_sectors=3,
intra_sector_corr_range=(0.5, 0.8),
inter_sector_corr_range=(-0.3, 0.4),
noise_level=0.1,
negative_corr_pairs=None
):
"""
Generate a realistic correlation matrix with block structure representing sectors.
"""
# Assign assets to sectors
assets_per_sector = n // n_sectors
remainder = n % n_sectors
sector_sizes = [assets_per_sector + (1 if i < remainder else 0) for i in range(n_sectors)]
# Create sector assignment list
sector_assignments = []
for sector_id, size in enumerate(sector_sizes):
sector_assignments.extend([sector_id] * size)
# Initialize correlation matrix
corr_matrix = np.eye(n)
# Define which sector pairs should have negative correlation
if negative_corr_pairs is None:
negative_corr_pairs = [(0, 2)] # Growth vs Defensive
# Fill in correlations
for i in range(n):
for j in range(i+1, n):
if sector_assignments[i] == sector_assignments[j]:
# Same sector - higher correlation
base_corr = np.random.uniform(*intra_sector_corr_range)
else:
# Different sectors
sector_pair = tuple(sorted([sector_assignments[i], sector_assignments[j]]))
if sector_pair in negative_corr_pairs:
# Negative correlation between these sectors
base_corr = np.random.uniform(-0.4, -0.1)
else:
# Normal inter-sector correlation
base_corr = np.random.uniform(*inter_sector_corr_range)
# Add noise
noise = np.random.normal(0, noise_level)
corr = np.clip(base_corr + noise, -0.99, 0.99)
corr_matrix[i, j] = corr
corr_matrix[j, i] = corr
# Ensure positive semi-definite by eigenvalue adjustment
eigenvalues, eigenvectors = np.linalg.eigh(corr_matrix)
eigenvalues = np.maximum(eigenvalues, 1e-8)
corr_matrix = eigenvectors @ np.diag(eigenvalues) @ eigenvectors.T
# Normalize to ensure diagonal is 1
d = np.sqrt(np.diag(corr_matrix))
corr_matrix = corr_matrix / np.outer(d, d)
return corr_matrix, sector_assignments
Now that the correlation matrix function has been defined it can utilised to generate the correlated GBM asset paths. This is achieved with the simulate_correlated_gbm
function. It takes in the various parameters typically required by a GBM simulation (see Geometric Brownian Motion Simulation with Python for more details), along with the correlation matrix (corr_matrix
) that is used to generated the correlated random variables.
Firstly, the correlation matrix is subjected to a Cholesky Decomposition. Then the stock price path array is initialised with the appropriate size and starting values. To generate the correlated random variables matrix \(Z\) a NumPy array Z
of shape (n_assets, n_paths)
is generated with standard normal entries. The Cholesky Decomposition matrix \(L\) is then right-multiplied by \(Z\) to generate the NumPy array of correlated random variables. Then the drift, diffusion and subsequent path entry components of the GBM are created using vectorised NumPy operations for efficiency.
def simulate_correlated_gbm(
S0,
mu,
sigma,
corr_matrix,
T,
dt,
n_paths
):
"""
Simulate correlated Geometric Brownian Motion paths.
"""
n_assets = len(S0)
n_steps = int(T / dt)
# Cholesky decomposition for correlation
L = np.linalg.cholesky(corr_matrix)
# Initialize paths
paths = np.zeros((n_steps + 1, n_assets, n_paths))
paths[0, :, :] = S0[:, np.newaxis]
# Generate correlated random shocks
for t in range(1, n_steps + 1):
# Independent standard normal random variables
Z = np.random.standard_normal((n_assets, n_paths))
# Correlate the random variables
Z_corr = L @ Z
# GBM formula
drift = (mu[:, np.newaxis] - 0.5 * sigma[:, np.newaxis]**2) * dt
diffusion = sigma[:, np.newaxis] * np.sqrt(dt) * Z_corr
paths[t] = paths[t-1] * np.exp(drift + diffusion)
return paths
The implementation leverages NumPy's broadcasting capabilities to simulate multiple paths simultaneously, which will be substantially more computationally efficient than using a (naive) Python for-loop.
Now that we have both the random correlation matrix generation and the correlated path generation functions we are able to visualise them using Matplotlib. The first visualisation is of the generated correlation matrix, using the Matplotlib imshow
method. Firstly, we create a custom diverging colour map using a LinearSegmentedColormap
instance with a list of colour names built in to Matplotlib. Then we plot the heatmap with imshow
and add all correlation values as text. Then we draw sector boundary lines between each of the sectors. This is possible as the assets are ordered appropriately.
def plot_correlation_matrix(
corr_matrix,
sector_assignments
):
"""
Plot correlation matrix with annotations and sector boundaries.
"""
fig, ax = plt.subplots(figsize=(10, 8))
# Create custom colormap (red-white-blue)
colors = ['darkred', 'red', 'lightcoral', 'white', 'lightblue', 'blue', 'darkblue']
n_bins = 100
cmap = LinearSegmentedColormap.from_list('correlation', colors, N=n_bins)
# Plot heatmap
im = ax.imshow(corr_matrix, cmap=cmap, vmin=-1, vmax=1, aspect='auto')
# Add text annotations
for i in range(len(corr_matrix)):
for j in range(len(corr_matrix)):
text = ax.text(j, i, f'{corr_matrix[i, j]:.2f}',
ha='center', va='center',
color='black' if abs(corr_matrix[i, j]) < 0.5 else 'white',
fontsize=8)
# Add sector boundaries
unique_sectors = sorted(set(sector_assignments))
sector_boundaries = []
current_pos = -0.5
for sector in unique_sectors:
sector_size = sector_assignments.count(sector)
sector_boundaries.append(current_pos + sector_size)
current_pos += sector_size
# Draw sector boundary lines
for boundary in sector_boundaries[:-1]:
ax.axhline(y=boundary, color='black', linewidth=2)
ax.axvline(x=boundary, color='black', linewidth=2)
return fig
The second visualisation shows some realisations of the correlated GBM paths, coloured via sector, with various random starting values, drifts and volatilities. It is clear that the paths within each sector (coloured similarly) are correlated, while those of differing sectors are less (or negatively) correlated with other paths.
def plot_price_paths(paths, sector_assignments, n_sample_paths=20):
"""
Plot simulated correlated price paths.
"""
fig, ax = plt.subplots(1, 1, figsize=(12, 8))
n_steps, n_assets, n_paths = paths.shape
time_grid = np.arange(n_steps)
# Define colors for different sectors
sector_colors = plt.cm.tab10(np.linspace(0, 1, len(set(sector_assignments))))
# Single realisation showing all assets and correlation
for asset_idx in range(n_assets):
sector = sector_assignments[asset_idx]
sample_path = paths[:, asset_idx, 0] # First simulation path
ax.plot(time_grid, sample_path,
color=sector_colors[sector],
linewidth=2,
alpha=0.8,
label=f'Asset {asset_idx+1} (Sector {sector+1})')
ax.set_xlabel('Time Steps', fontsize=12)
ax.set_ylabel('Price', fontsize=12)
ax.set_title('Assets and Correlation Structure', fontsize=14)
ax.grid(True, alpha=0.3)
# Add legend with sector grouping
handles, labels = ax.get_legend_handles_labels()
sorted_handles_labels = sorted(zip(handles, labels), key=lambda x: int(x[1].split('Sector ')[1][0]))
handles, labels = zip(*sorted_handles_labels)
ax.legend(handles, labels, bbox_to_anchor=(1.05, 1), loc='upper left', ncol=1)
plt.tight_layout()
return fig
The following entrypoint ties everything together with realistic parameter choices and a reproducible random seed specified:
if __name__ == "__main__":
# Set random seed for reproducibility
np.random.seed(42)
# Parameters
N = 15 # Number of assets
n_sectors = 3 # Number of sectors
# Generate correlation matrix
corr_matrix, sector_assignments = generate_realistic_correlation_matrix(
N,
n_sectors=n_sectors,
intra_sector_corr_range=(0.6, 0.85),
inter_sector_corr_range=(-0.2, 0.35),
noise_level=0.05,
negative_corr_pairs=[(0, 2)] # Growth vs Defensive
)
# GBM parameters
S0 = np.random.uniform(50, 150, N) # Initial prices
mu = np.random.uniform(0.05, 0.15, N) # Annual drift (5-15%)
sigma = np.random.uniform(0.15, 0.35, N) # Annual volatility (15-35%)
T = 1.0 # 1 year
dt = 1/252 # Daily steps (252 trading days)
n_paths = 1000 # Number of simulation paths
# Add sector-based adjustments
sector_names = ["Growth/Tech", "Neutral", "Defensive/Utilities"]
for i in range(N):
sector = sector_assignments[i]
if sector == 0: # Growth/Tech sector
mu[i] *= 1.3
sigma[i] *= 1.2
elif sector == 2: # Defensive/Utilities sector
mu[i] *= 0.7
sigma[i] *= 0.6
# Simulate paths
paths = simulate_correlated_gbm(S0, mu, sigma, corr_matrix, T, dt, n_paths)
# Generate visualizations
fig_corr = plot_correlation_matrix(corr_matrix, sector_assignments)
fig_paths = plot_price_paths_with_uncertainty(paths, sector_assignments)
This completes the implementation of all Python functionality. The full code can be found at the end of this article.
Results
Running the script in an appropriate virtual environment with all libraries installed will produce the following two visualisations.
The correlation matrix visualization clearly demonstrates the intended structure. The dark blue diagonal blocks represent high correlations within sectors (0.6-0.85), while the red/pink regions between sectors 0 and 2 show the negative correlations (-0.4 to -0.1) between growth and defensive stocks. This structure mimics real market behavior where certain sectors move inversely during risk-on/risk-off market regimes.

The price path visualisation reveals several important features. Firstly, the volatility structure is evident as growth sector assets (blue) exhibit higher volatility, consistent with their higher \(\sigma\) parameters, while defensive assets (green) show more stable paths. Secondly, assets within the same sector tend to move together, while growth and defensive sectors often move in opposite directions. Finally, the paths exhibit the characteristic exponential growth with random fluctuations expected from GBM, with no artificial smoothing or unrealistic jumps.
Next Steps
This implementation provides a useful foundation for generating synthetic equity data with realistic correlation structures. The approach successfully captures key market characteristics including sector clustering, negative correlations between defensive and growth assets, and appropriate volatility scaling. However, there are many potential improvements that can be made to significantly enhance the realism.
Firstly, it is possible to use more realistic time series models such as Stochastic Volatility or Jump-Diffusion processes. In addition it is possible to consider Regime-Switching models where the correlation matrix is itself time-varying to represent different regimes (such as bull/bear markets).
Secondly, more realistic models of the correlation matrix structure can be implemented. Modern Machine Learning (ML) techniques can be utilised here, such as Generative Adversarial Networks (GAN) or Convolutional Variational Autoencoders (CVAE) to generate far more realistic matrix structures.
In subsequent articles we will explore adding these techniques, eventually culminating in a useful tool that you can utilise in your own quantitative finance and systematic trading projects to generate large corpora of synthetic, but realistic, financial time series data.
Full Code
import matplotlib.pyplot as plt
from matplotlib.colors import LinearSegmentedColormap
import numpy as np
from scipy.linalg import block_diag
import seaborn as sns
def generate_realistic_correlation_matrix(
n,
n_sectors=3,
intra_sector_corr_range=(0.5, 0.8),
inter_sector_corr_range=(-0.3, 0.4),
noise_level=0.1,
negative_corr_pairs=None
):
"""
Generate a realistic correlation matrix with block structure representing sectors.
Parameters:
-----------
n : int
Size of the correlation matrix (number of assets)
n_sectors : int
Number of sectors/blocks
intra_sector_corr_range : tuple
Range for correlations within sectors (higher correlation)
inter_sector_corr_range : tuple
Range for correlations between sectors (can be negative)
noise_level : float
Amount of noise to add for realism
negative_corr_pairs : list of tuples
List of (sector_i, sector_j) pairs that should have negative correlation
Returns:
--------
corr_matrix : ndarray
NxN correlation matrix
sector_assignments : list
List indicating which sector each asset belongs to
"""
# Assign assets to sectors
assets_per_sector = n // n_sectors
remainder = n % n_sectors
sector_sizes = [assets_per_sector + (1 if i < remainder else 0) for i in range(n_sectors)]
# Create sector assignment list
sector_assignments = []
for sector_id, size in enumerate(sector_sizes):
sector_assignments.extend([sector_id] * size)
# Initialize correlation matrix
corr_matrix = np.eye(n)
# Define which sector pairs should have negative correlation
if negative_corr_pairs is None:
# Default: sector 0 (e.g., "tech/growth") negatively correlated with sector 2 (e.g., "utilities/defensive")
negative_corr_pairs = [(0, 2)]
# Fill in correlations
for i in range(n):
for j in range(i+1, n):
if sector_assignments[i] == sector_assignments[j]:
# Same sector - higher correlation
base_corr = np.random.uniform(*intra_sector_corr_range)
else:
# Different sectors
sector_pair = tuple(sorted([sector_assignments[i], sector_assignments[j]]))
if sector_pair in negative_corr_pairs:
# Negative correlation between these sectors
base_corr = np.random.uniform(-0.4, -0.1)
else:
# Normal inter-sector correlation
base_corr = np.random.uniform(*inter_sector_corr_range)
# Add noise
noise = np.random.normal(0, noise_level)
corr = np.clip(base_corr + noise, -0.99, 0.99)
corr_matrix[i, j] = corr
corr_matrix[j, i] = corr
# Ensure positive semi-definite by eigenvalue adjustment
eigenvalues, eigenvectors = np.linalg.eigh(corr_matrix)
eigenvalues = np.maximum(eigenvalues, 1e-8)
corr_matrix = eigenvectors @ np.diag(eigenvalues) @ eigenvectors.T
# Normalize to ensure diagonal is 1
d = np.sqrt(np.diag(corr_matrix))
corr_matrix = corr_matrix / np.outer(d, d)
return corr_matrix, sector_assignments
def simulate_correlated_gbm(
S0,
mu,
sigma,
corr_matrix,
T,
dt,
n_paths
):
"""
Simulate correlated Geometric Brownian Motion paths.
Parameters:
-----------
S0 : array-like
Initial prices for each asset
mu : array-like
Drift parameters for each asset
sigma : array-like
Volatility parameters for each asset
corr_matrix : ndarray
Correlation matrix
T : float
Time horizon
dt : float
Time step
n_paths : int
Number of simulation paths
Returns:
--------
paths : ndarray
Array of shape (n_steps, n_assets, n_paths) containing price paths
"""
n_assets = len(S0)
n_steps = int(T / dt)
# Cholesky decomposition for correlation
L = np.linalg.cholesky(corr_matrix)
# Initialize paths
paths = np.zeros((n_steps + 1, n_assets, n_paths))
paths[0, :, :] = S0[:, np.newaxis]
# Generate correlated random shocks
for t in range(1, n_steps + 1):
# Independent standard normal random variables
Z = np.random.standard_normal((n_assets, n_paths))
# Correlate the random variables
Z_corr = L @ Z
# GBM formula
drift = (mu[:, np.newaxis] - 0.5 * sigma[:, np.newaxis]**2) * dt
diffusion = sigma[:, np.newaxis] * np.sqrt(dt) * Z_corr
paths[t] = paths[t-1] * np.exp(drift + diffusion)
return paths
def plot_correlation_matrix(
corr_matrix,
sector_assignments
):
"""
Plot correlation matrix with annotations and sector boundaries.
"""
fig, ax = plt.subplots(figsize=(10, 8))
# Create custom colormap (red-white-blue)
colors = ['darkred', 'red', 'lightcoral', 'white', 'lightblue', 'blue', 'darkblue']
n_bins = 100
cmap = LinearSegmentedColormap.from_list('correlation', colors, N=n_bins)
# Plot heatmap
im = ax.imshow(corr_matrix, cmap=cmap, vmin=-1, vmax=1, aspect='auto')
# Add colorbar
cbar = plt.colorbar(im, ax=ax)
cbar.set_label('Correlation', rotation=270, labelpad=20)
# Add text annotations
for i in range(len(corr_matrix)):
for j in range(len(corr_matrix)):
text = ax.text(j, i, f'{corr_matrix[i, j]:.2f}',
ha='center', va='center',
color='black' if abs(corr_matrix[i, j]) < 0.5 else 'white',
fontsize=8)
# Add sector boundaries
unique_sectors = sorted(set(sector_assignments))
sector_boundaries = []
current_pos = -0.5
for sector in unique_sectors:
sector_size = sector_assignments.count(sector)
sector_boundaries.append(current_pos + sector_size)
current_pos += sector_size
# Draw sector boundary lines
for boundary in sector_boundaries[:-1]:
ax.axhline(y=boundary, color='black', linewidth=2)
ax.axvline(x=boundary, color='black', linewidth=2)
# Labels
ax.set_xticks(range(len(corr_matrix)))
ax.set_yticks(range(len(corr_matrix)))
ax.set_xticklabels([f'Asset {i+1}' for i in range(len(corr_matrix))], rotation=45)
ax.set_yticklabels([f'Asset {i+1}' for i in range(len(corr_matrix))])
ax.set_title('Equity Correlation Matrix with Sector Structure', fontsize=14, pad=20)
plt.tight_layout()
return fig
def plot_price_paths(paths, sector_assignments, n_sample_paths=20):
"""
Plot simulated correlated price paths.
"""
fig, ax = plt.subplots(1, 1, figsize=(12, 8))
n_steps, n_assets, n_paths = paths.shape
time_grid = np.arange(n_steps)
# Define colors for different sectors
sector_colors = plt.cm.tab10(np.linspace(0, 1, len(set(sector_assignments))))
# Single realisation showing all assets and correlation
for asset_idx in range(n_assets):
sector = sector_assignments[asset_idx]
sample_path = paths[:, asset_idx, 0] # First simulation path
ax.plot(time_grid, sample_path,
color=sector_colors[sector],
linewidth=2,
alpha=0.8,
label=f'Asset {asset_idx+1} (Sector {sector+1})')
ax.set_xlabel('Time Steps', fontsize=12)
ax.set_ylabel('Price', fontsize=12)
ax.set_title('Assets and Correlation Structure', fontsize=14)
ax.grid(True, alpha=0.3)
# Add legend with sector grouping
handles, labels = ax.get_legend_handles_labels()
sorted_handles_labels = sorted(zip(handles, labels), key=lambda x: int(x[1].split('Sector ')[1][0]))
handles, labels = zip(*sorted_handles_labels)
ax.legend(handles, labels, bbox_to_anchor=(1.05, 1), loc='upper left', ncol=1)
plt.tight_layout()
return fig
if __name__ == "__main__":
# Set random seed for reproducibility
np.random.seed(42)
# Parameters
N = 15 # Number of assets
n_sectors = 3 # Number of sectors
# Generate correlation matrix
print("Generating realistic correlation matrix...")
corr_matrix, sector_assignments = generate_realistic_correlation_matrix(
N,
n_sectors=n_sectors,
intra_sector_corr_range=(0.6, 0.85),
inter_sector_corr_range=(-0.2, 0.35), # Allow negative correlations
noise_level=0.05,
negative_corr_pairs=[(0, 2)] # Sectors 0 and 2 are negatively correlated
)
# GBM parameters
S0 = np.random.uniform(50, 150, N) # Initial prices
mu = np.random.uniform(0.05, 0.15, N) # Annual drift (5-15%)
sigma = np.random.uniform(0.15, 0.35, N) # Annual volatility (15-35%)
T = 1.0 # 1 year
dt = 1/252 # Daily steps (252 trading days)
n_paths = 1000 # Number of simulation paths
# Add sector-based adjustments to parameters
sector_names = ["Growth/Tech", "Neutral", "Defensive/Utilities"]
for i in range(N):
sector = sector_assignments[i]
# Adjust parameters by sector
if sector == 0: # Growth/Tech sector - higher growth, higher volatility
mu[i] *= 1.3
sigma[i] *= 1.2
elif sector == 2: # Defensive/Utilities sector - lower growth, lower volatility
mu[i] *= 0.7
sigma[i] *= 0.6
# Simulate paths
print("Simulating correlated GBM paths...")
paths = simulate_correlated_gbm(S0, mu, sigma, corr_matrix, T, dt, n_paths)
# Plot correlation matrix
print("Plotting correlation matrix...")
fig_corr = plot_correlation_matrix(corr_matrix, sector_assignments)
plt.show()
# Plot price paths
print("Plotting price paths...")
fig_paths = plot_price_paths(paths, sector_assignments, n_sample_paths=30)
plt.show()
# Print summary statistics
print("\n=== Summary Statistics ===")
print(f"Number of assets: {N}")
print(f"Number of sectors: {n_sectors}")
print(f"Sector names: Growth/Tech (0), Neutral (1), Defensive/Utilities (2)")
print(f"Assets per sector: {[sector_assignments.count(i) for i in range(n_sectors)]}")
print(f"\nCorrelation matrix properties:")
# Get upper triangle correlations (excluding diagonal)
upper_triangle_corr = corr_matrix[np.triu_indices(N, k=1)]
print(f" Min correlation: {upper_triangle_corr.min():.3f}")
print(f" Max correlation: {upper_triangle_corr.max():.3f}")
print(f" Number of negative correlations: {(upper_triangle_corr < 0).sum()}")
print(f" Mean intra-sector correlation: ", end="")
# Calculate mean intra-sector correlation
intra_corr = []
for i in range(N):
for j in range(i+1, N):
if sector_assignments[i] == sector_assignments[j]:
intra_corr.append(corr_matrix[i, j])
print(f"{np.mean(intra_corr):.3f}")
print(f" Mean inter-sector correlation: ", end="")
# Calculate mean inter-sector correlation
inter_corr = []
for i in range(N):
for j in range(i+1, N):
if sector_assignments[i] != sector_assignments[j]:
inter_corr.append(corr_matrix[i, j])
print(f"{np.mean(inter_corr):.3f}")
# Check positive semi-definite
eigenvalues = np.linalg.eigvals(corr_matrix)
print(f"\nSmallest eigenvalue: {eigenvalues.min():.6f} (should be > 0)")
# Final prices statistics
final_prices = paths[-1, :, :]
print(f"\nFinal price statistics:")
print(f" Mean returns: {(final_prices.mean(axis=1) / S0 - 1).mean():.2%}")
print(f" Volatility of returns: {(final_prices / S0[:, np.newaxis] - 1).std(axis=1).mean():.2%}")