Autoregressive Moving Average ARMA(p, q) Models for Time Series Analysis - Part 2

In Part 1 we considered the Autoregressive model of order p, also known as the AR(p) model. We introduced it as an extension of the random walk model in an attempt to explain additional serial correlation in financial time series.

Ultimately we realised that it was not sufficiently flexible to truly capture all of the autocorrelation in the closing prices of Amazon Inc. (AMZN) and the S&P500 US Equity Index. The primary reason for this is that both of these assets are *conditionally heteroskedastic*, which means that they are non-stationary and have periods of "varying variance" or volatility clustering, which is not taken into account by the AR(p) model.

In future articles we will eventually build up to the Autoregressive Integrated Moving Average (ARIMA) models, as well as the conditionally heteroskedastic models of the ARCH and GARCH families. These models will provide us with our first realistic attempts at forecasting asset prices.

In this article, however, we are going to introduce the **Moving Average of order q** model, known as **MA(q)**. This is a component of the more general ARMA model and as such we need to understand it before moving further.

*I highly recommend you read the previous articles in the Time Series Analysis collection if you have not done so. They can all be found here.*

A Moving Average model is *similar* to an Autoregressive model, except that instead of being a linear combination of past time series values, it is a linear combination of the past white noise terms.

Intuitively, this means that the MA model sees such random white noise "shocks" directly at each current value of the model. This is in contrast to an AR(p) model, where the white noise "shocks" are only seen *indirectly*, via regression onto previous terms of the series.

A key difference is that the MA model will only ever see the last $q$ shocks for any particular MA(q) model, whereas the AR(p) model will take all prior shocks into account, albeit in a decreasingly weak manner.

Mathematically, the MA(q) is a linear regression model and is similarly structured to AR(p):

A time series model, $\{ x_t \}$, is a *moving average model of order $q$*, MA(q), if:

Where $\{ w_t \}$ is *white noise* with $E(w_t) = 0$ and variance $\sigma^2$.

If we consider the *Backward Shift Operator*, ${\bf B}$ (see a previous article) then we can rewrite the above as a function $\phi$ of ${\bf B}$:

We will make use of the $\phi$ function in later articles.

As with AR(p) the mean of a MA(q) process is zero. This is easy to see as the mean is simply a sum of means of white noise terms, which are all themselves zero.

\begin{eqnarray} \text{Mean:} \enspace \mu_x = E(x_t) = \sum^{q}_{i=0} E(w_i) = 0 \end{eqnarray} \begin{eqnarray} \text{Var:} \enspace \sigma^2_w (1 + \beta^2_1 + \ldots + \beta^2_q) \end{eqnarray} $$\text{ACF:} \enspace \rho_k = \left\{\begin{aligned} &1 && \text{if} \enspace k = 0 \\ &\sum^{q-k}_{i=0} \beta_i \beta_{i+k} / \sum^q_{i=0} \beta^2_i && \text{if} \enspace k = 1, \ldots, q \\ &0 && \text{if} \enspace k > q \end{aligned} \right.$$Where $\beta_0 = 1$.

We're now going to generate some simulated data and use it to create correlograms. This will make the above formula for $\rho_k$ somewhat more concrete.

Let's start with a MA(1) process. If we set $\beta_1 = 0.6$ we obtain the following model:

\begin{eqnarray} x_t = w_t + 0.6 w_{t-1} \end{eqnarray}As with the AR(p) models in the previous article we can use R to simulate such a series and then plot the correlogram. Since we've had a lot of practice in the previous Time Series Analysis article series of carrying out plots, I will write the R code in full, rather than splitting it up:

> set.seed(1) > x <- w <- rnorm(100) > for (t in 2:100) x[t] <- w[t] + 0.6*w[t-1] > layout(1:2) > plot(x, type="l") > acf(x)

The output is as follows:

**Realisation of MA(1) Model, with $\beta_1 = 0.6$ and Associated Correlogram**

As we saw above in the formula for $\rho_k$, for $k > q$, all autocorrelations should be zero. Since $q = 1$, we should see a significant peak at $k=1$ and then insignificant peaks subsequent to that. However, due to sampling bias we should expect to see 5% (marginally) significant peaks on a *sample* autocorrelation plot.

This is precisely what the correlogram shows us in this case. We have a significant peak at $k=1$ and then insignificant peaks for $k > 1$, except at $k=4$ where we have a marginally significant peak.

In fact, this is a useful way of seeing whether an MA(q) model is appropriate. By taking a look at the correlogram of a particular series we can see how many sequential non-zero lags exist. If $q$ such lags exist then we can legitimately attempt to fit a MA(q) model to a particular series.

Since we have evidence from our simulated data of a MA(1) process, we're now going to try and fit a MA(1) model to our simulated data. Unfortunately, there isn't an equivalent `ma`

command to the autoregressive model `ar`

command in R.

Instead, we must use the more general `arima`

command and set the autoregressive and integrated components to zero. We do this by creating a 3-vector and setting the first two components (the autogressive and integrated parameters, respectively) to zero:

> x.ma <- arima(x, order=c(0, 0, 1)) > x.ma Call: arima(x = x, order = c(0, 0, 1)) Coefficients: ma1 intercept 0.6023 0.1681 s.e. 0.0827 0.1424 sigma^2 estimated as 0.7958: log likelihood = -130.7, aic = 267.39

We receive some useful output from the `arima`

command. Firstly, we can see that the parameter has been estimated as $\hat{\beta_1} = 0.602$, which is very close to the true value of $\beta_1 = 0.6$. Secondly, the standard errors are already calculated for us, making it straightforward to calculate confidence intervals. Thirdly, we receive an estimated variance, log-likelihood and Akaike Information Criterion (necessary for model comparison).

The major difference between `arima`

and `ar`

is that `arima`

estimates an intercept term because it does not subtract the mean value of the series. Hence we need to be careful when carrying out predictions using the `arima`

command. We'll return to this point later.

As a quick check we're going to calculate confidence intervals for $\hat{\beta_1}$:

> 0.6023 + c(-1.96, 1.96)*0.0827 [1] 0.440208 0.764392

We can see that the 95% confidence interval contains the true parameter value of $\beta_1 = 0.6$ and so we can judge the model a good fit. Obviously this should be expected since we simulated the data in the first place!

How do things change if we modify the sign of $\beta_1$ to -0.6? Let's perform the same analysis:

> set.seed(1) > x <- w <- rnorm(100) > for (t in 2:100) x[t] <- w[t] - 0.6*w[t-1] > layout(1:2) > plot(x, type="l") > acf(x)

The output is as follows:

**Realisation of MA(1) Model, with $\beta_1 = -0.6$ and Associated Correlogram**

We can see that at $k=1$ we have a significant peak in the correlogram, except that it shows negative correlation, as we'd expect from a MA(1) model with negative first coefficient. Once again all peaks beyond $k=1$ are insignificant. Let's fit a MA(1) model and estimate the parameter:

> x.ma <- arima(x, order=c(0, 0, 1)) > x.ma Call: arima(x = x, order = c(0, 0, 1)) Coefficients: ma1 intercept -0.7298 0.0486 s.e. 0.1008 0.0246 sigma^2 estimated as 0.7841: log likelihood = -130.11, aic = 266.23

$\hat{\beta_1} = -0.730$, which is a small underestimate of $\beta_1 = -0.6$. Finally, let's calculate the confidence interval:

> -0.730 + c(-1.96, 1.96)*0.1008 [1] -0.927568 -0.532432

We can see that the true parameter value of $\beta_1=-0.6$ is contained within the 95% confidence interval, providing us with evidence of a good model fit.

Let's run through the same procedure for a MA(3) process. This time we should expect significant peaks at $k \in \{1,2,3 \}$, and insignificant peaks for $k > 3$.

We are going to use the following coefficients: $\beta_1 = 0.6$, $\beta_2 = 0.4$ and $\beta_3 = 0.2$. Let's simulate a MA(3) process from this model. I've increased the number of random samples to 1000 in this simulation, which makes it easier to see the true autocorrelation structure, at the expense of making the original series harder to interpret:

> set.seed(3) > x <- w <- rnorm(1000) > for (t in 4:1000) x[t] <- w[t] + 0.6*w[t-1] + 0.4*w[t-2] + 0.3*w[t-3] > layout(1:2) > plot(x, type="l") > acf(x)

The output is as follows:

**Realisation of MA(3) Model and Associated Correlogram**

As expected the first three peaks are significant. However, so is the fourth. But we can legitimately suggest that this may be due to sampling bias as we expect to see 5% of the peaks being significant beyond $k=q$.

Let's now fit a MA(3) model to the data to try and estimate parameters:

> x.ma <- arima(x, order=c(0, 0, 3)) > x.ma Call: arima(x = x, order = c(0, 0, 3)) Coefficients: ma1 ma2 ma3 intercept 0.5439 0.3450 0.2975 -0.0948 s.e. 0.0309 0.0349 0.0311 0.0704 sigma^2 estimated as 1.039: log likelihood = -1438.47, aic = 2886.95

The estimates $\hat{\beta_1} = 0.544$, $\hat{\beta_2}=0.345$ and $\hat{\beta_3}=0.298$ are close to the true values of $\beta_1=0.6$, $\beta_2=0.4$ and $\beta_3=0.3$, respectively. We can also produce confidence intervals using the respective standard errors:

> 0.544 + c(-1.96, 1.96)*0.0309 [1] 0.483436 0.604564 > 0.345 + c(-1.96, 1.96)*0.0349 [1] 0.276596 0.413404 > 0.298 + c(-1.96, 1.96)*0.0311 [1] 0.237044 0.358956

In each case the 95% confidence intervals do contain the true parameter value and we can conclude that we have a good fit with our MA(3) model, as should be expected.

In Part 1 we considered Amazon Inc. (AMZN) and the S&P500 US Equity Index. We fitted the AR(p) model to both and found that the model was unable to effectively capture the complexity of the serial correlation, especially in the cast of the S&P500, where long-memory effects seem to be present.

I won't plot the charts again for the prices and autocorrelation, instead I'll refer you to the previous post.

Let's begin by trying to fit a selection of MA(q) models to AMZN, namely with $q \in \{ 1,2,3 \}$. As in Part 1, we'll use quantmod to download the daily prices for AMZN and then convert them into a log returns stream of closing prices:

> require(quantmod) > getSymbols("AMZN") > amznrt = diff(log(Cl(AMZN)))

Now that we have the log returns stream we can use the `arima`

command to fit MA(1), MA(2) and MA(3) models and then estimate the parameters of each. For MA(1) we have:

> amznrt.ma <- arima(amznrt, order=c(0, 0, 1)) > amznrt.ma Call: arima(x = amznrt, order = c(0, 0, 1)) Coefficients: ma1 intercept -0.030 0.0012 s.e. 0.023 0.0006 sigma^2 estimated as 0.0007044: log likelihood = 4796.01, aic = -9586.02

We can plot the residuals of the daily log returns and the fitted model:

> acf(amznrt.ma$res[-1])

**Residuals of MA(1) Model Fitted to AMZN Daily Log Prices**

Notice that we have a few significant peaks at lags $k=2$, $k=11$, $k=16$ and $k=18$, indicating that the MA(1) model is unlikely to be a good fit for the behaviour of the AMZN log returns, since this does not look like a realisation of white noise.

Let's try a MA(2) model:

> amznrt.ma <- arima(amznrt, order=c(0, 0, 2)) > amznrt.ma Call: arima(x = amznrt, order = c(0, 0, 2)) Coefficients: ma1 ma2 intercept -0.0254 -0.0689 0.0012 s.e. 0.0215 0.0217 0.0005 sigma^2 estimated as 0.0007011: log likelihood = 4801.02, aic = -9594.05

Both of the estimates for the $\beta$ coefficients are negative. Let's plot the residuals once again:

> acf(amznrt.ma$res[-1])

**Residuals of MA(2) Model Fitted to AMZN Daily Log Prices**

We can see that there is almost zero autocorrelation in the first few lags. However, we have five marginally significant peaks at lags $k=12$, $k=16$, $k=19$, $k=25$ and $k=27$. This is suggestive that the MA(2) model is capturing a lot of the autocorrelation, but not all of the long-memory effects. How about a MA(3) model?

> amznrt.ma <- arima(amznrt, order=c(0, 0, 3)) > amznrt.ma Call: arima(x = amznrt, order = c(0, 0, 3)) Coefficients: ma1 ma2 ma3 intercept -0.0262 -0.0690 0.0177 0.0012 s.e. 0.0214 0.0217 0.0212 0.0005 sigma^2 estimated as 0.0007009: log likelihood = 4801.37, aic = -9592.75

Once again, we can plot the residuals:

> acf(amznrt.ma$res[-1])

**Residuals of MA(3) Model Fitted to AMZN Daily Log Prices**

The MA(3) residuals plot looks almost identical to that of the MA(2) model. This is not surprising, as we're adding a new parameter to a model that has seemingly explained away much of the correlations at shorter lags, but that won't have much of an effect on the longer term lags.

All of this evidence is suggestive of the fact that an MA(q) model is unlikely to be useful in explaining all of the serial correlation *in isolation*, at least for AMZN.

If you recall, in Part 1 we saw that the first order differenced daily log returns structure of the S&P500 possessed many significant peaks at various lags, both short and long. This provided evidence of both *conditional heteroskedasticity* (i.e. volatility clustering) and *long-memory effects*. It lead us to conclude that the AR(p) model was insufficient to capture all of the autocorrelation present.

As we've seen above the MA(q) model was insufficient to capture additional serial correlation in the residuals of the fitted model to the first order differenced daily log price series. We will now attempt to fit the MA(q) model to the S&P500.

One might ask why we are doing this is if we know that it is unlikely to be a good fit. This is a good question. The answer is that we need to see exactly *how* it isn't a good fit, because this is the ultimate process we will be following when we come across much more sophisticated models, that are potentially harder to interpret.

Let's begin by obtaining the data and converting it to a first order differenced series of logarithmically transformed daily closing prices as in the previous article:

> getSymbols("^GSPC") > gspcrt = diff(log(Cl(GSPC)))

We are now going to fit a MA(1), MA(2) and MA(3) model to the series, as we did above for AMZN. Let's start with MA(1):

> gspcrt.ma <- arima(gspcrt, order=c(0, 0, 1)) > gspcrt.ma Call: arima(x = gspcrt, order = c(0, 0, 1)) Coefficients: ma1 intercept -0.1284 2e-04 s.e. 0.0223 3e-04 sigma^2 estimated as 0.0001844: log likelihood = 6250.23, aic = -12494.46

Let's make a plot of the residuals of this fitted model:

> acf(gspcrt.ma$res[-1])

**Residuals of MA(1) Model Fitted to S&P500 Daily Log Prices**

The first significant peak occurs at $k=2$, but there are many more at $k \in \{ 5, 10, 14, 15, 16, 18, 20, 21 \}$. This is clearly not a realisation of white noise and so we must reject the MA(1) model as a potential good fit for the S&P500.

Does the situation improve with MA(2)?

> gspcrt.ma <- arima(gspcrt, order=c(0, 0, 2)) > gspcrt.ma Call: arima(x = gspcrt, order = c(0, 0, 2)) Coefficients: ma1 ma2 intercept -0.1189 -0.0524 2e-04 s.e. 0.0216 0.0223 2e-04 sigma^2 estimated as 0.0001839: log likelihood = 6252.96, aic = -12497.92

Once again, let's make a plot of the residuals of this fitted MA(2) model:

> acf(gspcrt.ma$res[-1])

**Residuals of MA(2) Model Fitted to S&P500 Daily Log Prices**

While the peak at $k=2$ has disappeared (as we'd expect), we are still left with the significant peaks at many longer lags in the residuals. Once again, we find the MA(2) model is not a good fit.

We should expect, for the MA(3) model, to see less serial correlation at $k=3$ than for the MA(2), but once again we should also expect no reduction in further lags.

> gspcrt.ma <- arima(gspcrt, order=c(0, 0, 3)) > gspcrt.ma Call: arima(x = gspcrt, order = c(0, 0, 3)) Coefficients: ma1 ma2 ma3 intercept -0.1189 -0.0529 0.0289 2e-04 s.e. 0.0214 0.0222 0.0211 3e-04 sigma^2 estimated as 0.0001838: log likelihood = 6253.9, aic = -12497.81

Finally, let's make a plot of the residuals of this fitted MA(3) model:

> acf(gspcrt.ma$res[-1])

**Residuals of MA(3) Model Fitted to S&P500 Daily Log Prices**

This is precisely what we see in the correlogram of the residuals. Hence the MA(3), as with the other models above, is not a good fit for the S&P500.

We've now examined two major time series models in detail, namely the Autogressive model of order p, AR(p) and then Moving Average of order q, MA(q). We've seen that they're both capable of explaining away some of the autocorrelation in the residuals of first order differenced daily log prices of equities and indices, but volatility clustering and long-memory effects persist.

It is finally time to turn our attention to the combination of these two models, namely the Autoregressive Moving Average of order $p,q$, ARMA(p,q) to see if it will improve the situation any further.

However, we will have to wait until the next article for a full discussion!

comments powered by DisqusYou'll get instant access to a free 10-part email course packed with hints and tips to help you get started in quantitative trading!

Every week I'll send you a wrap of all activity on QuantStart so you'll never miss a post again.

Real, actionable quant trading tips with no nonsense.