BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
BrianAronson
Calcite | Level 5

Hi,

This is a fairly naive question:  I am trying to create a regression model for somewhat skewed and clustered survey data.  A professor suggested I use maximum likelihood estimation with GLS, rather than OLS, to account for some of the heteroskedasticity and autocorrelation in my data.  As far as I am aware, PROC REG uses OLS, PROC GLM uses ML, and PROC MIXED uses REML.  However, these methods (see code below) all seem to yield the same estimates.  Why is this?  Shouldn't changing the estimation method change my estimates and standard errors?

proc reg;

  Model y= x1 x2 x3 x4 x5;

  weight weights;

  run;

proc glm;

  Model y= x1 x2 x3 x4 x5 /solution;

  weight weights;

  run;

proc mixed Method=REML;

  Model y= x1 x2 x3 x4 x5 /solution;

  weight weights;

  run;

1 ACCEPTED SOLUTION

Accepted Solutions
PaigeMiller
Diamond | Level 26

PROC REG and PROC GLM use OLS. PROC MIXED uses maximum likelihood or REML. If you are really interested in a regression for non-normal error structures, you might want to look into PROC GLIMMIX instead of PROC MIXED.

There are cases where OLS and ML should give the same result, namely if the errors in the regression model are i.i.d. normal.

In your case, you say your data is skewed and clustered ... well, if the errors are i.i.d normal, you can still use OLS. If there is heteroskedasticity, this too can be handled in PROC REG or PROC GLM using weighted least squares, which is a feature of these two PROCs.

The claim of autocorrelation makes me somewhat confused, as autocorrelation usually arises in time series data and not survey data. In survey data (as I understand the term) there is no natural ordering (like there is in time series data) so the idea of autocorrelation in survey data makes no sense to me. Can anyone explain further?

So anyway, nothing you have said to me rules out the use of OLS, although clearly there may be issues that haven't been mentioned that would indeed rule out OLS. Of course if the distribution of the errors is not normal, then maybe you want PROC GLIMMIX and not PROC MIXED, or maybe you just need to transform the data (if possible) so that the errors are normally distributed and then use OLS.

Of course, if your professor expects you to use PROC MIXED, then maybe you should ... Smiley Wink

--
Paige Miller

View solution in original post

4 REPLIES 4
PaigeMiller
Diamond | Level 26

PROC REG and PROC GLM use OLS. PROC MIXED uses maximum likelihood or REML. If you are really interested in a regression for non-normal error structures, you might want to look into PROC GLIMMIX instead of PROC MIXED.

There are cases where OLS and ML should give the same result, namely if the errors in the regression model are i.i.d. normal.

In your case, you say your data is skewed and clustered ... well, if the errors are i.i.d normal, you can still use OLS. If there is heteroskedasticity, this too can be handled in PROC REG or PROC GLM using weighted least squares, which is a feature of these two PROCs.

The claim of autocorrelation makes me somewhat confused, as autocorrelation usually arises in time series data and not survey data. In survey data (as I understand the term) there is no natural ordering (like there is in time series data) so the idea of autocorrelation in survey data makes no sense to me. Can anyone explain further?

So anyway, nothing you have said to me rules out the use of OLS, although clearly there may be issues that haven't been mentioned that would indeed rule out OLS. Of course if the distribution of the errors is not normal, then maybe you want PROC GLIMMIX and not PROC MIXED, or maybe you just need to transform the data (if possible) so that the errors are normally distributed and then use OLS.

Of course, if your professor expects you to use PROC MIXED, then maybe you should ... Smiley Wink

--
Paige Miller
BrianAronson
Calcite | Level 5

Thanks.  My data is cross-sectional, but the survey was collected over a long period of time, so I thought time might cluster my data in some meaningful way.  I will use weighted least squares to get rid of some of the clustering, and take the log of my dependent variable to account for the heteroskedasticity.

PaigeMiller
Diamond | Level 26

so I thought time might cluster my data in some meaningful way.  I will use weighted least squares to get rid of some of the clustering

So you are going to perform clustering on your data and then eliminate the clustering?

--
Paige Miller
Ksharp
Super User

If your data is survey data ,then take a look at PROC SURVEYREG .

How big of your sample data ? If it is small sample , I would recommend to use OLS ,on account of unbiased estimator ,whereas ML is biased estimator .

Only you have lots of obs , ML is recommend to use .

Xia Keshan

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

What is ANOVA?

ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 4 replies
  • 7190 views
  • 3 likes
  • 3 in conversation