Frances,
What difference does it really make whether the error variance in an OLS model is included as a parameter when computing AIC? Suppose that you have models 1, 2, and 3, each with a different (non-nested) set of fixed-effect parameters fitted to the same set of observations. These models have error sums of squares SSE{1}, SSE{2}, and SSE{3}, and number of regression parameters p{1}, p{2}, and p{3} (where regression parameters include all beta_hat estimates).
Now, if we use the AIC values presented by PROC REG, we have
AIC{1a} = n * ln( SSE{1} /n ) + 2p{1}
AIC{2a} = n * ln( SSE{2} /n ) + 2p{2}
AIC{3a} = n * ln( SSE{3} /n ) + 2p{3}
Differences between AIC values for these models are:
AIC{1a} - AIC{2a} = n * ( ln( SSE{1}/n ) - ln( SSE{2}/n ) ) + 2(p{1} - p{2})
AIC{1a} - AIC{3a} = n * ( ln( SSE{1}/n ) - ln( SSE{3}/n ) ) + 2(p{1} - p{3})
AIC{2a} - AIC{3a} = n * ( ln( SSE{2}/n ) - ln( SSE{3}/n ) ) + 2(p{2} - p{3})
Alternatively, according to to Anderson and Burnham, you would compute
AIC{1a} = n * ln( SSE{1} /n ) + 2(p{1} + 1)
AIC{2a} = n * ln( SSE{2} /n ) + 2(p{2} + 1)
AIC{3a} = n * ln( SSE{3} /n ) + 2(p{3} + 1)
Differences between AIC values for these models are:
AIC{1a} - AIC{2a} = n * ( ln( SSE{1}/n ) - ln( SSE{2}/n ) ) + 2((p{1} + 1) - (p{2} + 1))
= n * ( ln( SSE{1}/n ) - ln( SSE{2}/n ) ) + 2(p{1} - p{2})
AIC{1a} - AIC{3a} = n * ( ln( SSE{1}/n ) - ln( SSE{3}/n ) ) + 2((p{1} + 1) - (p{3} + 1))
= n * ( ln( SSE{1}/n ) - ln( SSE{3}/n ) ) + 2(p{1} - p{3})
AIC{2a} - AIC{3a} = n * ( ln( SSE{2}/n ) - ln( SSE{3}/n ) ) + 2((p{2} + 1) - (p{3} + 1))
= n * ( ln( SSE{2}/n ) - ln( SSE{3}/n ) ) + 2(p{2} - p{3})
So, when you compare AIC values against one another, you obtain the same difference whether you do or do not include the variance estimate as one of the parameters. And since the difference between AIC values is unchanged according to whether or not you include the residual variance as a parameter, then it should not matter which form is employed. Can you provide an instance where it would make a difference in model comparisons whether you do or do not include the variance estimate among the parameters?