Generating Synthetic Equity Data with Realistic Correlation Structure

This article describes how to generate synthetic structured correlation matrices for the purposes of generating synthetic correlated equities data, which has many uses within backtesting validation and machine learning training.

Recently on QuantStart we have begun looking at generating synthetic asset price paths using Stochastic Differential Equation models such as the Brownian Motion, Geometric Brownian Motion (GBM), Ornstein-Uhlenbeck and Vasicek Models. Historically, we have also considered more sophisticated models such as the Heston Stochastic Volatility Model. What we have not considered to any great extent in previous posts is how to generate multiple price paths simultaneously that are all correlated.

In real market settings, such as in equities markets, a complex correlation structure exists across assets, somewhat reflective of the classification of stocks into sectors and subsectors. This post will provide a complete example of how to generate a random, structured correlation matrix, which will then be used to generate a collection of correlated GBM paths. Such synthetic paths have a number of uses within quantitative finance, which we will now elaborate on.

Introduction

Synthetic equity data generation has become an indispensable tool in quantitative finance, serving multiple critical purposes across research, development, and production environments. The ability to generate realistic, correlated price paths allows quant finance practitioners to stress-test portfolios, validate backtesting frameworks, and train machine learning models without relying solely on limited historical data.

One of the primary challenges in financial modeling is the scarcity of extreme market events in historical data. While we may have decades of daily returns, true market crises are rare, making it difficult to assess how strategies perform under extreme conditions. Synthetic data generation enables us to create thousands of plausible market scenarios, including rare "black swan" events, providing a more comprehensive view of potential risks.

Furthermore, debugging quantitative trading systems requires controlled test cases with known properties. When a backtesting framework produces unexpected results, it's invaluable to input synthetic data with precisely defined statistical properties to isolate whether issues stem from the strategy logic, the data pipeline, or the execution simulator.

For machine learning applications, synthetic data serves as an excellent pre-training dataset. Models can learn general market dynamics from unlimited synthetic samples before fine-tuning on limited real market data.

In the following sections we are going to use open-source Python data science libraries, such as NumPy, SciPy, Matplotlib and Seaborn to generate these random matrices, the correlated price paths and associated visualisation plots. Firstly, however we will discuss the mathematical theory of the generation.

Mathematical Theory

The mathematical theory of the approach will be broken down into two subsections. The first discusses the mathematics around generating structured, random correlated matrices of the appropriate type. The second discusses how to generate correlated GBM paths using these correlation matrices.

Generating Realistic Correlation Matrices

Real equity markets exhibit complex correlation structures that cannot be captured by simple random matrices. Stocks tend to cluster into sectors (technology, financials, utilities, etc.), with higher correlations within sectors and lower—sometimes negative—correlations between certain sector pairs. Our approach models this structure explicitly.

We construct the correlation matrix \(\Sigma\) by first assigning each of \(N\) assets to one of \(K\) sectors. For any pair of assets \(i\) and \(j\), the correlation \(\rho_{ij}\) is determined by:

\[ \rho_{ij} = \begin{cases} \mathcal{U}(\rho_{\text{min_intra}}, \rho_{\text{max_intra}}) + \epsilon_{ij} & \text{if } s_i = s_j \\ \mathcal{U}(\rho_{\text{min_neg}}, \rho_{\text{max_neg}}) + \epsilon_{ij} & \text{if } (s_i, s_j) \in \mathcal{N} \\ \mathcal{U}(\rho_{\text{min_inter}}, \rho_{\text{max_inter}}) + \epsilon_{ij} & \text{otherwise} \end{cases} \]

Where:

  • \(s_i\) denotes the sector assignment of asset \(i\)
  • \(\mathcal{U}(a,b)\) represents a uniform distribution on \([a,b]\)
  • \(\epsilon_{ij} \sim \mathcal{N}(0, \sigma_{\text{noise}}^2)\) adds realistic noise
  • \(\mathcal{N}\) is the set of sector pairs with negative correlation

To ensure the resulting matrix is positive semi-definite (a requirement for valid correlation matrices), we perform eigenvalue decomposition and adjust any negative eigenvalues:

\[ \Sigma = V \Lambda V^T \quad \rightarrow \quad \tilde{\Sigma} = V \max(\Lambda, \epsilon I) V^T \]

Finally, we normalize to ensure unit diagonal elements: \(\Sigma_{ij} = \tilde{\Sigma}_{ij} / \sqrt{\tilde{\Sigma}_{ii} \tilde{\Sigma}_{jj}}\)

Simulating Correlated Geometric Brownian Motion

Asset prices are modeled using Geometric Brownian Motion (GBM), where the price \(S_i(t)\) of asset \(i\) follows:

\[ dS_i(t) = \mu_i S_i(t) dt + \sigma_i S_i(t) dW_i(t) \]

Where \(\mu_i\) is the drift, \(\sigma_i\) is the volatility, and \(dW_i(t)\) are correlated Brownian increments with \(\text{Cov}(dW_i, dW_j) = \rho_{ij} dt\).

Using the Euler-Maruyama discretisation with time step \(\Delta t\):

\[ S_i(t + \Delta t) = S_i(t) \exp\left[\left(\mu_i - \frac{\sigma_i^2}{2}\right)\Delta t + \sigma_i \sqrt{\Delta t} Z_i(t)\right] \]

The key challenge is generating correlated normal random variables \(Z_i(t)\) that respect our correlation matrix \(\Sigma\). This is achieved through Cholesky decomposition: \(\Sigma = LL^T\), where \(L\) is lower triangular.

Given independent standard normal variables \(\tilde{Z} \sim \mathcal{N}(0, I)\), we obtain correlated variables via:

\[ Z = L\tilde{Z} \quad \Rightarrow \quad \text{Cov}(Z) = L \text{Cov}(\tilde{Z}) L^T = LL^T = \Sigma \]

Python Implementation

In this section we will describe the Python implementation for the correlation matrix generation, the correlated GBM path generation and the visualisation functions. This will be concluded by describing the if __name__ == "__main__": entrypoint.

The first task is to import the relevant libraries and classes. This includes Matplotlib, NumPy, Scipy and Seaborn:

import matplotlib.pyplot as plt
from matplotlib.colors import LinearSegmentedColormap
import numpy as np
from scipy.linalg import block_diag
import seaborn as sns

Once the libraries are imported the next step is to create a function to generate realistic block structures with controllable inter- and intra-sector correlations. The generate_realistic_correlation_matrix function takes in the size of the matrix n, the number of sectors n_sectors, the inter- and intra-sector correlation ranges, the noise level of the matrix and optionally which sector pairs should have negative correlations.

The function begins by assigning assets to sectors, calculating the sector sizes and creating the sector assignment list. It then generates an identity matrix as the initial basis for the correlation matrix. The bulk of the function iterates over each row and column of the upper triangular portion of the matrix, sampling from uniform distributions for each element, with ranges for each distribution determined by whether assets are in the same sector or not. Noise is then added to the matrix values, which are all clipped to approximately \([-1, 1]\). Finally, the eigenvalues of the correlation matrix are adjusted to ensure positive semi-definiteness and normalised to ensure diagonals are unity.

def generate_realistic_correlation_matrix(
    n,
    n_sectors=3,
    intra_sector_corr_range=(0.5, 0.8), 
    inter_sector_corr_range=(-0.3, 0.4),
    noise_level=0.1,
    negative_corr_pairs=None
):
    """
    Generate a realistic correlation matrix with block structure representing sectors.
    """
    # Assign assets to sectors
    assets_per_sector = n // n_sectors
    remainder = n % n_sectors
    sector_sizes = [assets_per_sector + (1 if i < remainder else 0) for i in range(n_sectors)]

    # Create sector assignment list
    sector_assignments = []
    for sector_id, size in enumerate(sector_sizes):
        sector_assignments.extend([sector_id] * size)

    # Initialize correlation matrix
    corr_matrix = np.eye(n)

    # Define which sector pairs should have negative correlation
    if negative_corr_pairs is None:
        negative_corr_pairs = [(0, 2)]  # Growth vs Defensive

    # Fill in correlations
    for i in range(n):
        for j in range(i+1, n):
            if sector_assignments[i] == sector_assignments[j]:
                # Same sector - higher correlation
                base_corr = np.random.uniform(*intra_sector_corr_range)
            else:
                # Different sectors
                sector_pair = tuple(sorted([sector_assignments[i], sector_assignments[j]]))
                
                if sector_pair in negative_corr_pairs:
                    # Negative correlation between these sectors
                    base_corr = np.random.uniform(-0.4, -0.1)
                else:
                    # Normal inter-sector correlation
                    base_corr = np.random.uniform(*inter_sector_corr_range)
            
            # Add noise
            noise = np.random.normal(0, noise_level)
            corr = np.clip(base_corr + noise, -0.99, 0.99)
            
            corr_matrix[i, j] = corr
            corr_matrix[j, i] = corr

    # Ensure positive semi-definite by eigenvalue adjustment
    eigenvalues, eigenvectors = np.linalg.eigh(corr_matrix)
    eigenvalues = np.maximum(eigenvalues, 1e-8)
    corr_matrix = eigenvectors @ np.diag(eigenvalues) @ eigenvectors.T

    # Normalize to ensure diagonal is 1
    d = np.sqrt(np.diag(corr_matrix))
    corr_matrix = corr_matrix / np.outer(d, d)

    return corr_matrix, sector_assignments

Now that the correlation matrix function has been defined it can utilised to generate the correlated GBM asset paths. This is achieved with the simulate_correlated_gbm function. It takes in the various parameters typically required by a GBM simulation (see Geometric Brownian Motion Simulation with Python for more details), along with the correlation matrix (corr_matrix) that is used to generated the correlated random variables.

Firstly, the correlation matrix is subjected to a Cholesky Decomposition. Then the stock price path array is initialised with the appropriate size and starting values. To generate the correlated random variables matrix \(Z\) a NumPy array Z of shape (n_assets, n_paths) is generated with standard normal entries. The Cholesky Decomposition matrix \(L\) is then right-multiplied by \(Z\) to generate the NumPy array of correlated random variables. Then the drift, diffusion and subsequent path entry components of the GBM are created using vectorised NumPy operations for efficiency.

def simulate_correlated_gbm(
    S0,
    mu,
    sigma,
    corr_matrix,
    T,
    dt,
    n_paths
):
    """
    Simulate correlated Geometric Brownian Motion paths.
    """
    n_assets = len(S0)
    n_steps = int(T / dt)

    # Cholesky decomposition for correlation
    L = np.linalg.cholesky(corr_matrix)

    # Initialize paths
    paths = np.zeros((n_steps + 1, n_assets, n_paths))
    paths[0, :, :] = S0[:, np.newaxis]

    # Generate correlated random shocks
    for t in range(1, n_steps + 1):
        # Independent standard normal random variables
        Z = np.random.standard_normal((n_assets, n_paths))
        
        # Correlate the random variables
        Z_corr = L @ Z
        
        # GBM formula
        drift = (mu[:, np.newaxis] - 0.5 * sigma[:, np.newaxis]**2) * dt
        diffusion = sigma[:, np.newaxis] * np.sqrt(dt) * Z_corr
        
        paths[t] = paths[t-1] * np.exp(drift + diffusion)

    return paths

The implementation leverages NumPy's broadcasting capabilities to simulate multiple paths simultaneously, which will be substantially more computationally efficient than using a (naive) Python for-loop.

Now that we have both the random correlation matrix generation and the correlated path generation functions we are able to visualise them using Matplotlib. The first visualisation is of the generated correlation matrix, using the Matplotlib imshow method. Firstly, we create a custom diverging colour map using a LinearSegmentedColormap instance with a list of colour names built in to Matplotlib. Then we plot the heatmap with imshow and add all correlation values as text. Then we draw sector boundary lines between each of the sectors. This is possible as the assets are ordered appropriately.

def plot_correlation_matrix(
    corr_matrix,
    sector_assignments
):
    """
    Plot correlation matrix with annotations and sector boundaries.
    """
    fig, ax = plt.subplots(figsize=(10, 8))

    # Create custom colormap (red-white-blue)
    colors = ['darkred', 'red', 'lightcoral', 'white', 'lightblue', 'blue', 'darkblue']
    n_bins = 100
    cmap = LinearSegmentedColormap.from_list('correlation', colors, N=n_bins)

    # Plot heatmap
    im = ax.imshow(corr_matrix, cmap=cmap, vmin=-1, vmax=1, aspect='auto')

    # Add text annotations
    for i in range(len(corr_matrix)):
        for j in range(len(corr_matrix)):
            text = ax.text(j, i, f'{corr_matrix[i, j]:.2f}',
                          ha='center', va='center',
                          color='black' if abs(corr_matrix[i, j]) < 0.5 else 'white',
                          fontsize=8)

    # Add sector boundaries
    unique_sectors = sorted(set(sector_assignments))
    sector_boundaries = []
    current_pos = -0.5

    for sector in unique_sectors:
        sector_size = sector_assignments.count(sector)
        sector_boundaries.append(current_pos + sector_size)
        current_pos += sector_size

    # Draw sector boundary lines
    for boundary in sector_boundaries[:-1]:
        ax.axhline(y=boundary, color='black', linewidth=2)
        ax.axvline(x=boundary, color='black', linewidth=2)

    return fig

The second visualisation shows some realisations of the correlated GBM paths, coloured via sector, with various random starting values, drifts and volatilities. It is clear that the paths within each sector (coloured similarly) are correlated, while those of differing sectors are less (or negatively) correlated with other paths.

def plot_price_paths(paths, sector_assignments, n_sample_paths=20):
    """
    Plot simulated correlated price paths.
    """
    fig, ax = plt.subplots(1, 1, figsize=(12, 8))
    
    n_steps, n_assets, n_paths = paths.shape
    time_grid = np.arange(n_steps)
    
    # Define colors for different sectors
    sector_colors = plt.cm.tab10(np.linspace(0, 1, len(set(sector_assignments))))
    
    # Single realisation showing all assets and correlation
    for asset_idx in range(n_assets):
        sector = sector_assignments[asset_idx]
        sample_path = paths[:, asset_idx, 0]  # First simulation path
        
        ax.plot(time_grid, sample_path, 
                color=sector_colors[sector], 
                linewidth=2, 
                alpha=0.8,
                label=f'Asset {asset_idx+1} (Sector {sector+1})')
    
    ax.set_xlabel('Time Steps', fontsize=12)
    ax.set_ylabel('Price', fontsize=12)
    ax.set_title('Assets and Correlation Structure', fontsize=14)
    ax.grid(True, alpha=0.3)
    
    # Add legend with sector grouping
    handles, labels = ax.get_legend_handles_labels()
    sorted_handles_labels = sorted(zip(handles, labels), key=lambda x: int(x[1].split('Sector ')[1][0]))
    handles, labels = zip(*sorted_handles_labels)
    ax.legend(handles, labels, bbox_to_anchor=(1.05, 1), loc='upper left', ncol=1)
    
    plt.tight_layout()
    return fig

The following entrypoint ties everything together with realistic parameter choices and a reproducible random seed specified:

if __name__ == "__main__":
    # Set random seed for reproducibility
    np.random.seed(42)

    # Parameters
    N = 15  # Number of assets
    n_sectors = 3  # Number of sectors

    # Generate correlation matrix
    corr_matrix, sector_assignments = generate_realistic_correlation_matrix(
        N, 
        n_sectors=n_sectors,
        intra_sector_corr_range=(0.6, 0.85),
        inter_sector_corr_range=(-0.2, 0.35),
        noise_level=0.05,
        negative_corr_pairs=[(0, 2)]  # Growth vs Defensive
    )

    # GBM parameters
    S0 = np.random.uniform(50, 150, N)  # Initial prices
    mu = np.random.uniform(0.05, 0.15, N)  # Annual drift (5-15%)
    sigma = np.random.uniform(0.15, 0.35, N)  # Annual volatility (15-35%)
    T = 1.0  # 1 year
    dt = 1/252  # Daily steps (252 trading days)
    n_paths = 1000  # Number of simulation paths

    # Add sector-based adjustments
    sector_names = ["Growth/Tech", "Neutral", "Defensive/Utilities"]
    for i in range(N):
        sector = sector_assignments[i]
        if sector == 0:  # Growth/Tech sector
            mu[i] *= 1.3
            sigma[i] *= 1.2
        elif sector == 2:  # Defensive/Utilities sector
            mu[i] *= 0.7
            sigma[i] *= 0.6

    # Simulate paths
    paths = simulate_correlated_gbm(S0, mu, sigma, corr_matrix, T, dt, n_paths)

    # Generate visualizations
    fig_corr = plot_correlation_matrix(corr_matrix, sector_assignments)
    fig_paths = plot_price_paths_with_uncertainty(paths, sector_assignments)

This completes the implementation of all Python functionality. The full code can be found at the end of this article.

Results

Running the script in an appropriate virtual environment with all libraries installed will produce the following two visualisations.

Correlation Matrix Visualisation
Figure 1: Block-structured correlation matrix showing high intra-sector correlations (blue blocks) and negative correlations between growth and defensive sectors (red areas).

The correlation matrix visualization clearly demonstrates the intended structure. The dark blue diagonal blocks represent high correlations within sectors (0.6-0.85), while the red/pink regions between sectors 0 and 2 show the negative correlations (-0.4 to -0.1) between growth and defensive stocks. This structure mimics real market behavior where certain sectors move inversely during risk-on/risk-off market regimes.

Simulated Price Paths
Figure 2: Simulated correlated price paths showing correlated behaviour.

The price path visualisation reveals several important features. Firstly, the volatility structure is evident as growth sector assets (blue) exhibit higher volatility, consistent with their higher \(\sigma\) parameters, while defensive assets (green) show more stable paths. Secondly, assets within the same sector tend to move together, while growth and defensive sectors often move in opposite directions. Finally, the paths exhibit the characteristic exponential growth with random fluctuations expected from GBM, with no artificial smoothing or unrealistic jumps.

Next Steps

This implementation provides a useful foundation for generating synthetic equity data with realistic correlation structures. The approach successfully captures key market characteristics including sector clustering, negative correlations between defensive and growth assets, and appropriate volatility scaling. However, there are many potential improvements that can be made to significantly enhance the realism.

Firstly, it is possible to use more realistic time series models such as Stochastic Volatility or Jump-Diffusion processes. In addition it is possible to consider Regime-Switching models where the correlation matrix is itself time-varying to represent different regimes (such as bull/bear markets).

Secondly, more realistic models of the correlation matrix structure can be implemented. Modern Machine Learning (ML) techniques can be utilised here, such as Generative Adversarial Networks (GAN) or Convolutional Variational Autoencoders (CVAE) to generate far more realistic matrix structures.

In subsequent articles we will explore adding these techniques, eventually culminating in a useful tool that you can utilise in your own quantitative finance and systematic trading projects to generate large corpora of synthetic, but realistic, financial time series data.

Full Code

import matplotlib.pyplot as plt
from matplotlib.colors import LinearSegmentedColormap
import numpy as np
from scipy.linalg import block_diag
import seaborn as sns


def generate_realistic_correlation_matrix(
    n,
    n_sectors=3,
    intra_sector_corr_range=(0.5, 0.8), 
    inter_sector_corr_range=(-0.3, 0.4),
    noise_level=0.1,
    negative_corr_pairs=None
):
    """
    Generate a realistic correlation matrix with block structure representing sectors.
    
    Parameters:
    -----------
    n : int
        Size of the correlation matrix (number of assets)
    n_sectors : int
        Number of sectors/blocks
    intra_sector_corr_range : tuple
        Range for correlations within sectors (higher correlation)
    inter_sector_corr_range : tuple
        Range for correlations between sectors (can be negative)
    noise_level : float
        Amount of noise to add for realism
    negative_corr_pairs : list of tuples
        List of (sector_i, sector_j) pairs that should have negative correlation
    
    Returns:
    --------
    corr_matrix : ndarray
        NxN correlation matrix
    sector_assignments : list
        List indicating which sector each asset belongs to
    """
    # Assign assets to sectors
    assets_per_sector = n // n_sectors
    remainder = n % n_sectors
    sector_sizes = [assets_per_sector + (1 if i < remainder else 0) for i in range(n_sectors)]
    
    # Create sector assignment list
    sector_assignments = []
    for sector_id, size in enumerate(sector_sizes):
        sector_assignments.extend([sector_id] * size)
    
    # Initialize correlation matrix
    corr_matrix = np.eye(n)
    
    # Define which sector pairs should have negative correlation
    if negative_corr_pairs is None:
        # Default: sector 0 (e.g., "tech/growth") negatively correlated with sector 2 (e.g., "utilities/defensive")
        negative_corr_pairs = [(0, 2)]
    
    # Fill in correlations
    for i in range(n):
        for j in range(i+1, n):
            if sector_assignments[i] == sector_assignments[j]:
                # Same sector - higher correlation
                base_corr = np.random.uniform(*intra_sector_corr_range)
            else:
                # Different sectors
                sector_pair = tuple(sorted([sector_assignments[i], sector_assignments[j]]))
                
                if sector_pair in negative_corr_pairs:
                    # Negative correlation between these sectors
                    base_corr = np.random.uniform(-0.4, -0.1)
                else:
                    # Normal inter-sector correlation
                    base_corr = np.random.uniform(*inter_sector_corr_range)
            
            # Add noise
            noise = np.random.normal(0, noise_level)
            corr = np.clip(base_corr + noise, -0.99, 0.99)
            
            corr_matrix[i, j] = corr
            corr_matrix[j, i] = corr
    
    # Ensure positive semi-definite by eigenvalue adjustment
    eigenvalues, eigenvectors = np.linalg.eigh(corr_matrix)
    eigenvalues = np.maximum(eigenvalues, 1e-8)
    corr_matrix = eigenvectors @ np.diag(eigenvalues) @ eigenvectors.T
    
    # Normalize to ensure diagonal is 1
    d = np.sqrt(np.diag(corr_matrix))
    corr_matrix = corr_matrix / np.outer(d, d)
    
    return corr_matrix, sector_assignments


def simulate_correlated_gbm(
    S0,
    mu,
    sigma,
    corr_matrix,
    T,
    dt,
    n_paths
):
    """
    Simulate correlated Geometric Brownian Motion paths.
    
    Parameters:
    -----------
    S0 : array-like
        Initial prices for each asset
    mu : array-like
        Drift parameters for each asset
    sigma : array-like
        Volatility parameters for each asset
    corr_matrix : ndarray
        Correlation matrix
    T : float
        Time horizon
    dt : float
        Time step
    n_paths : int
        Number of simulation paths
    
    Returns:
    --------
    paths : ndarray
        Array of shape (n_steps, n_assets, n_paths) containing price paths
    """
    n_assets = len(S0)
    n_steps = int(T / dt)
    
    # Cholesky decomposition for correlation
    L = np.linalg.cholesky(corr_matrix)
    
    # Initialize paths
    paths = np.zeros((n_steps + 1, n_assets, n_paths))
    paths[0, :, :] = S0[:, np.newaxis]
    
    # Generate correlated random shocks
    for t in range(1, n_steps + 1):
        # Independent standard normal random variables
        Z = np.random.standard_normal((n_assets, n_paths))
        
        # Correlate the random variables
        Z_corr = L @ Z
        
        # GBM formula
        drift = (mu[:, np.newaxis] - 0.5 * sigma[:, np.newaxis]**2) * dt
        diffusion = sigma[:, np.newaxis] * np.sqrt(dt) * Z_corr
        
        paths[t] = paths[t-1] * np.exp(drift + diffusion)
    
    return paths


def plot_correlation_matrix(
    corr_matrix,
    sector_assignments
):
    """
    Plot correlation matrix with annotations and sector boundaries.
    """
    fig, ax = plt.subplots(figsize=(10, 8))
    
    # Create custom colormap (red-white-blue)
    colors = ['darkred', 'red', 'lightcoral', 'white', 'lightblue', 'blue', 'darkblue']
    n_bins = 100
    cmap = LinearSegmentedColormap.from_list('correlation', colors, N=n_bins)
    
    # Plot heatmap
    im = ax.imshow(corr_matrix, cmap=cmap, vmin=-1, vmax=1, aspect='auto')
    
    # Add colorbar
    cbar = plt.colorbar(im, ax=ax)
    cbar.set_label('Correlation', rotation=270, labelpad=20)
    
    # Add text annotations
    for i in range(len(corr_matrix)):
        for j in range(len(corr_matrix)):
            text = ax.text(j, i, f'{corr_matrix[i, j]:.2f}',
                          ha='center', va='center',
                          color='black' if abs(corr_matrix[i, j]) < 0.5 else 'white',
                          fontsize=8)
    
    # Add sector boundaries
    unique_sectors = sorted(set(sector_assignments))
    sector_boundaries = []
    current_pos = -0.5
    
    for sector in unique_sectors:
        sector_size = sector_assignments.count(sector)
        sector_boundaries.append(current_pos + sector_size)
        current_pos += sector_size
    
    # Draw sector boundary lines
    for boundary in sector_boundaries[:-1]:
        ax.axhline(y=boundary, color='black', linewidth=2)
        ax.axvline(x=boundary, color='black', linewidth=2)
    
    # Labels
    ax.set_xticks(range(len(corr_matrix)))
    ax.set_yticks(range(len(corr_matrix)))
    ax.set_xticklabels([f'Asset {i+1}' for i in range(len(corr_matrix))], rotation=45)
    ax.set_yticklabels([f'Asset {i+1}' for i in range(len(corr_matrix))])
    ax.set_title('Equity Correlation Matrix with Sector Structure', fontsize=14, pad=20)
    
    plt.tight_layout()
    return fig


def plot_price_paths(paths, sector_assignments, n_sample_paths=20):
    """
    Plot simulated correlated price paths.
    """
    fig, ax = plt.subplots(1, 1, figsize=(12, 8))
    
    n_steps, n_assets, n_paths = paths.shape
    time_grid = np.arange(n_steps)
    
    # Define colors for different sectors
    sector_colors = plt.cm.tab10(np.linspace(0, 1, len(set(sector_assignments))))
    
    # Single realisation showing all assets and correlation
    for asset_idx in range(n_assets):
        sector = sector_assignments[asset_idx]
        sample_path = paths[:, asset_idx, 0]  # First simulation path
        
        ax.plot(time_grid, sample_path, 
                color=sector_colors[sector], 
                linewidth=2, 
                alpha=0.8,
                label=f'Asset {asset_idx+1} (Sector {sector+1})')
    
    ax.set_xlabel('Time Steps', fontsize=12)
    ax.set_ylabel('Price', fontsize=12)
    ax.set_title('Assets and Correlation Structure', fontsize=14)
    ax.grid(True, alpha=0.3)
    
    # Add legend with sector grouping
    handles, labels = ax.get_legend_handles_labels()
    sorted_handles_labels = sorted(zip(handles, labels), key=lambda x: int(x[1].split('Sector ')[1][0]))
    handles, labels = zip(*sorted_handles_labels)
    ax.legend(handles, labels, bbox_to_anchor=(1.05, 1), loc='upper left', ncol=1)
    
    plt.tight_layout()
    return fig


if __name__ == "__main__":
    # Set random seed for reproducibility
    np.random.seed(42)
    
    # Parameters
    N = 15  # Number of assets
    n_sectors = 3  # Number of sectors
    
    # Generate correlation matrix
    print("Generating realistic correlation matrix...")
    corr_matrix, sector_assignments = generate_realistic_correlation_matrix(
        N, 
        n_sectors=n_sectors,
        intra_sector_corr_range=(0.6, 0.85),
        inter_sector_corr_range=(-0.2, 0.35),  # Allow negative correlations
        noise_level=0.05,
        negative_corr_pairs=[(0, 2)]  # Sectors 0 and 2 are negatively correlated
    )
    
    # GBM parameters
    S0 = np.random.uniform(50, 150, N)  # Initial prices
    mu = np.random.uniform(0.05, 0.15, N)  # Annual drift (5-15%)
    sigma = np.random.uniform(0.15, 0.35, N)  # Annual volatility (15-35%)
    T = 1.0  # 1 year
    dt = 1/252  # Daily steps (252 trading days)
    n_paths = 1000  # Number of simulation paths
    
    # Add sector-based adjustments to parameters
    sector_names = ["Growth/Tech", "Neutral", "Defensive/Utilities"]
    for i in range(N):
        sector = sector_assignments[i]
        # Adjust parameters by sector
        if sector == 0:  # Growth/Tech sector - higher growth, higher volatility
            mu[i] *= 1.3
            sigma[i] *= 1.2
        elif sector == 2:  # Defensive/Utilities sector - lower growth, lower volatility
            mu[i] *= 0.7
            sigma[i] *= 0.6
    
    # Simulate paths
    print("Simulating correlated GBM paths...")
    paths = simulate_correlated_gbm(S0, mu, sigma, corr_matrix, T, dt, n_paths)
    
    # Plot correlation matrix
    print("Plotting correlation matrix...")
    fig_corr = plot_correlation_matrix(corr_matrix, sector_assignments)
    plt.show()
    
    # Plot price paths
    print("Plotting price paths...")
    fig_paths = plot_price_paths(paths, sector_assignments, n_sample_paths=30)
    plt.show()
    
    # Print summary statistics
    print("\n=== Summary Statistics ===")
    print(f"Number of assets: {N}")
    print(f"Number of sectors: {n_sectors}")
    print(f"Sector names: Growth/Tech (0), Neutral (1), Defensive/Utilities (2)")
    print(f"Assets per sector: {[sector_assignments.count(i) for i in range(n_sectors)]}")
    print(f"\nCorrelation matrix properties:")
    
    # Get upper triangle correlations (excluding diagonal)
    upper_triangle_corr = corr_matrix[np.triu_indices(N, k=1)]
    print(f"  Min correlation: {upper_triangle_corr.min():.3f}")
    print(f"  Max correlation: {upper_triangle_corr.max():.3f}")
    print(f"  Number of negative correlations: {(upper_triangle_corr < 0).sum()}")
    
    print(f"  Mean intra-sector correlation: ", end="")
    
    # Calculate mean intra-sector correlation
    intra_corr = []
    for i in range(N):
        for j in range(i+1, N):
            if sector_assignments[i] == sector_assignments[j]:
                intra_corr.append(corr_matrix[i, j])
    print(f"{np.mean(intra_corr):.3f}")
    
    print(f"  Mean inter-sector correlation: ", end="")
    # Calculate mean inter-sector correlation
    inter_corr = []
    for i in range(N):
        for j in range(i+1, N):
            if sector_assignments[i] != sector_assignments[j]:
                inter_corr.append(corr_matrix[i, j])
    print(f"{np.mean(inter_corr):.3f}")
    
    # Check positive semi-definite
    eigenvalues = np.linalg.eigvals(corr_matrix)
    print(f"\nSmallest eigenvalue: {eigenvalues.min():.6f} (should be > 0)")
    
    # Final prices statistics
    final_prices = paths[-1, :, :]
    print(f"\nFinal price statistics:")
    print(f"  Mean returns: {(final_prices.mean(axis=1) / S0 - 1).mean():.2%}")
    print(f"  Volatility of returns: {(final_prices / S0[:, np.newaxis] - 1).std(axis=1).mean():.2%}")