# Serial Correlation

- Correcting For Serial Correlation
- Serial Correlation Excel
- Serial Correlation Problem
- Serial Correlation Econometrics

One of the assumptions underpinning multiple regression is that regression errors are homoscedastic. In other words, the **variance** of the error terms is **equal for all observations**:

This video explains what is meant by serial correlation, and how it can lead estimators to be inefficient. Check out https://ben-lambert.com/econometrics-cou. Serial correlation causes OLS to no longer be a minimum variance estimator. Serial correlation causes the estimated variances of the regression coefficients to be biased, leading to unreliable hypothesis testing. The t-statistics will actually appear. Testing for Serial Correlation The above discussion suggests a very simple strategy for testing for serial correlation: check the magnitude and significance level of your estimated. Economists that deal with time-series data often prefer the sophisticated-yet-unintuitive.

$$E(epsilon_{i}^{2})=sigma_{epsilon}^{2}, i=1,2,…,n$$

In reality, the variance of errors differs across observations. This is known as **heteroskedasticity.**

The following figure illustrates homoscedasticity and heteroskedasticity.

## Types of Heteroskedasticity

### Unconditional Heteroskedasticity

Unconditional heteroskedasticity occurs when the heteroskedasticity is uncorrelated with the values of the independent variables. Although this is a violation of the homoscedasticity assumption, it does not present major problems to statistical inference.

### Conditional Heteroskedasticity

Conditional heteroskedasticity occurs when the error variance is related/conditional on the values of the independent variables. It poses significant problems for statistical inference. Fortunately, many statistical software packages can diagnose and correct this error.

## Effects of Heteroskedasticity

i. It does not affect the consistency of the regression parameter estimators.

ii. Heteroskedastic errors make the F-test overall significance of the regression unreliable.

iii. Heteroskedasticity introduces bias into estimators of the standard error of regression coefficients making the t-tests for the significance of individual regression coefficients unreliable.

iv. More specifically, it results in inflated t-statistics and underestimated standard errors.

## Testing Heteroskedasticity

### Breusch-Pagan chi-square test

The Breusch-Pagan chi-square test looks at the regression of the squared residuals from the estimated regression equation on the independent variables. The presence of conditional heteroskedasticity in the original regression equation substantially explains the variation in the squared residuals.

The test statistic is given by:

$$text{BP chi}-text{square test statistic}=ntimes{R^{2}}$$

Where:

- (n) = number of observations.
- (R^{2}) = the (R^{2}) in the regression of the squared residuals.

This test statistic is a chi-square random variable with *k* degrees of freedom.

The null hypothesis is that there is no conditional heteroskedasticity, i.e., the squared error term is uncorrelated with the independent variables. The Breusch-pagan test is a one-tailed test as we should be mainly concerned with heteroskedasticity for large values of the test statistic.

#### Example: Breusch-Pagan chi-squaretest

Consider the multiple regression of the price of the USDX on the inflation rates and the real interest rates. The investor regresses the squared residuals from the original regression on the independent variables. The new (R^{2}) is 0.1874. Test for the presence of heteroskedasticity at the 5% significance level.

## Correcting For Serial Correlation

**Solution**

The test statistic is:

$$text{BP chi}- text{square test statistic}=ntimes{R^{2}}$$

$$text{Test statistic}= 10times0.1874=1.874$$

The one-tailed critical value for a chi-square distribution with two degrees of freedom at the 5% significance level is 5.991.

Therefore, we cannot reject the null hypothesis of no conditional heteroskedasticity. As a result, we conclude that the error term is NOT conditionally heteroskedastic.

## Correcting Heteroskedasticity

In the investment world, it is crucial to correct heteroskedasticity as it may change inferences about a particular hypothesis test, thus impacting an investment decision. There are two methods that can be applied to correct heteroskedasticity:

**Calculating robust standard errors:**This approach corrects the standard errors of the model’s estimated coefficients to account for the conditional heteroskedasticity. These are also known as white-corrected standard errors. These standard errors are then used to calculate the t-statistics again using the original regression coefficients.**Generalized least squares:**The original regression equation is modified to eliminate heteroskedasticity. The modified equation is then estimated, assuming that heteroskedasticity is no longer a problem.

## Serial Correlation (Autocorrelation)

Autocorrelation occurs when the assumption that regression errors are uncorrelated across all observations is violated. In other words, autocorrelation is evident when errors in one period are correlated with errors in other periods. This is common with **time-series data** (which we will see in the next reading).

### Types of Serial Correlation

#### Positive serial correlation

This is a serial correlation in which positive regression errors for one observation increases the possibility of observing a positive regression error for another observation.

### Negative serial correlation

This is serial correlation in which a positive regression error for one observation increases the likelihood of observing a negative regression error for another observation.

## Effects of Serial Correlation

Autocorrelation does not cause bias in the coefficient estimates of the regression. However, a positive serial correlation inflates the F-statistic to test for the overall significance of the regression as the mean squared error (MSE) will tend to underestimate the population error variance. This increases Type I errors (the rejection of the null hypothesis when it is actually true).

The positive serial correlation makes the ordinary least squares standard errors for the regression coefficients underestimate the true standard errors. Moreover, it leads to small standard errors of the regression coefficient, making the estimated t-statistics seem to be statistically significant relative to their actual significance.

On the other hand, negative serial correlation overestimates standard errors and understates the F-statistics. This increases Type II errors (The acceptance of the null hypothesis when it is actually false).

## Testing for Serial Correlation

The first step of testing for serial correlation is by plotting the residuals against time. The other most common formal test is the Durbin-Watson test.

## Durbin-Watson Test

The Durbin Watson tests the null hypothesis of **no serial correlation against** the alternative hypothesis of **positive or negative serial correlation.**

The Durbin-Watson Statistic (DW) is approximated by:

$$DW=2(1-r)$$

Where:

- (r) = Sample correlation between regression residuals from one period and the previous period.

The Durbin Watson statistic can take on values ranging from 0 to 4. i.e., (0<DW<4).

i. If there is no autocorrelation, the regression errors will be uncorrelated, and thus DW = 2.

$$DW=2(1-r)=2(1-0)=2$$

ii. For positive serial autocorrelation, (DW<2).

For example, if serial correlation of the regression residuals (=1,DW=2(1-1)=0)

iii. For negative autocorrelation, (DW>2).

For example, if serial correlation of the regression regression residual (=-1, DW=2(1-(-1))=4).

The null hypothesis of no positive autocorrelation is rejected if the Durbin–Watson statistic is below a critical value, (d^{*}), where (d^{*}) lies between an upper value (d_{u}) and a lower value (d_{l}) or outside of these values.

This is illustrated below.

Key Guidelines

- If (d<d_{l}), reject (H_{0}: ρ =0) (and so accept (H_{1}:ρ >0)).
- If (d>d_{u}), do not reject (H_{0}:ρ =0).
- If (d_{l}< d<d_{u}), the test is inconclusive.

### Example: The Durbin Watson Test for Serial Correlation

Consider a regression output that includes two independent variables that generate a DW statistic of 0.654. Assume that the sample size is 15. Test for serial correlation of the error terms at the 5% significance level.

#### Solution

From the Durbin Watson table with (n=15) and (k=2), we see that (d_{l}=0.95) and (d_{u}=1.54). Since (d=0.654<0.95=d_{l}), we reject the null hypothesis and conclude that there is significant positive autocorrelation.

## Correcting Autocorrelation

We can correct serial correlation by:

i. Adjusting the coefficient standard errors for the regression estimates to take into account serial correlation. This is done using the Hansen method. This method can also be used to correct conditional heteroskedasticity.** Hansen white standard errors** are then used for hypothesis testing of the regression coefficient.

ii. Modifying the regression equation to eliminate the serial correlation.

Consider a regression model with 80 observations and two independent variables. Suppose that the correlation between the error term and a first lagged value of the error term is 0.15. The *most appropriate* decision is:

A. Reject the null hypothesis of positive serial correlation.

B. Fail to reject the null hypothesis of positive serial correlation.

C. Declare that the test results are inconclusive.

### Solution

**The correct answer is B.**

The test statistic is:

$$DW≈2(1-r)=2(1-0.18)=1.64$$

The critical values from the Durbin Watson table with (n=80) and (k=2) is (d_{l}=1.59) and (d_{u}=1.69)

Because (1.64>1.59), we fail to reject the null hypothesis of positive serial correlation.

*Reading 5:*

*LOS 5k: Explain the types of heteroskedasticity and how heteroskedasticity and serial correlation affect statistical inference;*

**Trustpilot**rating score:

**4.7**of 5, based on

**61 reviews**.

In this article, we will follow Drukker (2003) procedure to derive the first-order serial correlation test proposed by Jeff Wooldridge (2002) for panel data. It has to be mentioned that this test is considered a robust test, since works with lesser assumptions on the behavior of the heterogeneous individual effects.

We start with the linear model as:

Where y represents the dependent variable, X is the (1xK) vector of exogenous variables, Z is considered a vector of time-invariant covariates. With *µ* as individual effects for each individual. Special importance is associated with the correlation between X and *µ* since, if such correlation is zero (or uncorrelated), we better go for the random-effects model, however, if X and *µ* are correlated, it’s better to stick with fixed-effects.

The estimators of fixed and random effects rely on the absence of serial correlation. From this Wooldridge use the residual from the regression of (1) but in first-differences, which is of the form of:

Notice that such differentiating procedure eliminates the individual effects contained in *µ*, leading us to think that level-effects are time-invariant, hence if we analyze the variations, we conclude there’s non-existing variation over time of the individual effects.

Once we got the regression in first differences (and assuming that individual-level effects are eliminated) we use the predicted values of the residuals of the first difference regression. Then we double-check the correlation between the residual of the first difference equation and its first lag, if there’s no serial correlation then the correlation should have a value of -0.5 as the next expression states.

Therefore, if the correlation is equal to -.5 the original model in (1) will not have serial correlation. However, if it differs significantly, we have a serial correlation problem of first-order in the original model in (1).

## Serial Correlation Excel

For all of the regressions, we account for the within-panel correlation, therefore all of the procedures require the inclusion of the cluster regression, and also, we omit the constant term in the difference equation. In sum we do:

**Specify our model (whether if it has fixed or random effects, but these should be time-invariant).****Create the difference model (using first differences on all the variables, therefore the difference model will not have any individual effects). We perform the regression while clustering the individuals and we omit the constant term.****We predict the residuals of the difference model.****We regress the predicted residual over the first lag of the predicted residual. We also cluster this regression and omit the constant.****We test the hypothesis if the lagged residual equal to -0.5.**

Let’s do a quick example of this steps using the same example as Drukker.

We start loading the database.

Then we format the database for stata with the code:

Then we generate some quadratic variables.

## Serial Correlation Problem

We regress our model of the form of:

It doesn’t matter whether if it is fixed or random effects as long as we assume that individuals’ effects are time invariant (therefore they get eliminated in the first difference model).

Now let’s do the manual estimation of the test. In order to do this, we use a pooled regression of the model without the constant and clustering the regression for the panel variable. This is done of the form:

## Serial Correlation Econometrics

The options noconst eliminates the constant term for the difference model, and cluster option includes a clustering approach in the regression structure, finally idcode is the panel variable which we identify our individuals in the panel.

The next thing to do is predict the residuals of the last pooled difference regression, and we do this with:

Then we regress the predicted residual u against the first lag of u, while we cluster and also eliminate the constant of the regression as before.

Finally, we test the hypothesis whether if the coefficient of the first lag of the pooled difference equation is equal or not to -0.5

According to the results we strongly reject the null hypothesis of no serial correlation with a 5% level of significance. Therefore, the model has serial correlation problems.

We can also perform the test with the Stata compiled package of Drukker, which can be somewhat faster. We do this by using

and we’ll have the same results. However, the advantage of the manual procedure of the test is that it can be done for any kind of model or regression.

**Bibliography**

Drukker, D. (2003) Testing for serial correlation in linear panel-data models, The Stata Journal, 3(2), pp. 168–177. Taken from: https://journals.sagepub.com/doi/pdf/10.1177/1536867X0300300206

Wooldridge, J. M. (2002). Econometric Analysis of Cross Section and Panel Data. Cam-bridge, MA: MIT Press.