Solved: Re: Maximum Likelihood Estimation or OLS (PROC REG vs PROC MIXED)

BrianAronson · Posted 06-01-2015 02:54 PM

Hi,

This is a fairly naive question: I am trying to create a regression model for somewhat skewed and clustered survey data. A professor suggested I use maximum likelihood estimation with GLS, rather than OLS, to account for some of the heteroskedasticity and autocorrelation in my data. As far as I am aware, PROC REG uses OLS, PROC GLM uses ML, and PROC MIXED uses REML. However, these methods (see code below) all seem to yield the same estimates. Why is this? Shouldn't changing the estimation method change my estimates and standard errors?

proc reg;

Model y= x1 x2 x3 x4 x5;

weight weights;

run;

proc glm;

Model y= x1 x2 x3 x4 x5 /solution;

weight weights;

run;

proc mixed Method=REML;

Model y= x1 x2 x3 x4 x5 /solution;

weight weights;

run;

PaigeMiller · Posted 06-01-2015 04:08 PM

PROC REG and PROC GLM use OLS. PROC MIXED uses maximum likelihood or REML. If you are really interested in a regression for non-normal error structures, you might want to look into PROC GLIMMIX instead of PROC MIXED.

There are cases where OLS and ML should give the same result, namely if the errors in the regression model are i.i.d. normal.

In your case, you say your data is skewed and clustered ... well, if the errors are i.i.d normal, you can still use OLS. If there is heteroskedasticity, this too can be handled in PROC REG or PROC GLM using weighted least squares, which is a feature of these two PROCs.

The claim of autocorrelation makes me somewhat confused, as autocorrelation usually arises in time series data and not survey data. In survey data (as I understand the term) there is no natural ordering (like there is in time series data) so the idea of autocorrelation in survey data makes no sense to me. Can anyone explain further?

So anyway, nothing you have said to me rules out the use of OLS, although clearly there may be issues that haven't been mentioned that would indeed rule out OLS. Of course if the distribution of the errors is not normal, then maybe you want PROC GLIMMIX and not PROC MIXED, or maybe you just need to transform the data (if possible) so that the errors are normally distributed and then use OLS.

Of course, if your professor expects you to use PROC MIXED, then maybe you should ...

--
Paige Miller

View solution in original post

PaigeMiller · Posted 06-01-2015 04:08 PM

PROC REG and PROC GLM use OLS. PROC MIXED uses maximum likelihood or REML. If you are really interested in a regression for non-normal error structures, you might want to look into PROC GLIMMIX instead of PROC MIXED.

There are cases where OLS and ML should give the same result, namely if the errors in the regression model are i.i.d. normal.

In your case, you say your data is skewed and clustered ... well, if the errors are i.i.d normal, you can still use OLS. If there is heteroskedasticity, this too can be handled in PROC REG or PROC GLM using weighted least squares, which is a feature of these two PROCs.

The claim of autocorrelation makes me somewhat confused, as autocorrelation usually arises in time series data and not survey data. In survey data (as I understand the term) there is no natural ordering (like there is in time series data) so the idea of autocorrelation in survey data makes no sense to me. Can anyone explain further?

So anyway, nothing you have said to me rules out the use of OLS, although clearly there may be issues that haven't been mentioned that would indeed rule out OLS. Of course if the distribution of the errors is not normal, then maybe you want PROC GLIMMIX and not PROC MIXED, or maybe you just need to transform the data (if possible) so that the errors are normally distributed and then use OLS.

Of course, if your professor expects you to use PROC MIXED, then maybe you should ...

--
Paige Miller

BrianAronson · Posted 06-01-2015 04:19 PM

Thanks. My data is cross-sectional, but the survey was collected over a long period of time, so I thought time might cluster my data in some meaningful way. I will use weighted least squares to get rid of some of the clustering, and take the log of my dependent variable to account for the heteroskedasticity.

PaigeMiller · Posted 06-02-2015 08:06 AM

so I thought time might cluster my data in some meaningful way. I will use weighted least squares to get rid of some of the clustering

So you are going to perform clustering on your data and then eliminate the clustering?

--
Paige Miller

Ksharp · Posted 06-02-2015 08:48 AM

If your data is survey data ,then take a look at PROC SURVEYREG .

How big of your sample data ? If it is small sample , I would recommend to use OLS ,on account of unbiased estimator ,whereas ML is biased estimator .

Only you have lots of obs , ML is recommend to use .

Xia Keshan

Maximum Likelihood Estimation or OLS (PROC REG vs PROC MIXED)

Re: Maximum Likelihood Estimation or OLS (PROC REG vs PROC MIXED)

Re: Maximum Likelihood Estimation or OLS (PROC REG vs PROC MIXED)

Re: Maximum Likelihood Estimation or OLS (PROC REG vs PROC MIXED)

Re: Maximum Likelihood Estimation or OLS (PROC REG vs PROC MIXED)

Re: Maximum Likelihood Estimation or OLS (PROC REG vs PROC MIXED)

Catch up on SAS Innovate 2026