turn on suggestions

Auto-suggest helps you quickly narrow down your search results by suggesting possible matches as you type.

Showing results for

Find a Community

- Home
- /
- Analytics
- /
- Stat Procs
- /
- Proc GLM vs Proc Surveyreg Standard Errors

Topic Options

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

01-20-2015 03:13 PM

Hello All,

I am attempting to cluster standard errors by state using proc surveyreg for an OLS regression that I previously ran using proc glm.

In doing so, I found that without clustering, the coefficient estimates between the two procedures are the same but the standard errors are different (the standard errors from proc surveyreg are higher than proc glm). Does anyone know why this is the case? I want to make sure that the difference between the clustered standard errors from proc surveyreg and the standard errors from proc glm is only due to clustering and not also attributable to something going on behind the scenes in proc surveyreg that I am unaware of. I could not find anything that expains how proc surveyreg calculates standard errors the users guide.

Below is example code illustrating how I ran the procedures. Am I wrong to believe that proc surveyreg without the cluster option is the same as plain OLS?

proc glm data = data;

class categrical_vars(ref = reference_level);

model = model/ solution;

where some_condition;

run;

proc surveyreg data = data order = formatted;

format reference_format;

class categorical_vars;

model = model/ solution;

where some_condition;

run;

Thanks in advance for your help!

Accepted Solutions

Solution

01-20-2015
04:18 PM

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Posted in reply to mdvogan

01-20-2015 04:18 PM

From the online help:

**Many SAS/STAT procedures, such as the MEANS, FREQ, GLM, LOGISTIC, and PHREG procedures, can compute sample means, produce crosstabulation tables, and estimate regression relationships. However, in most of these procedures, statistical inference is based on the assumption that the sample is drawn from an infinite population by simple random sampling. If the sample is in fact selected from a finite population by using a complex survey design, these procedures generally do not calculate the estimates and their variances according to the design actually used. Using analyses that are not appropriate for your sample design can lead to incorrect statistical inferences**.

Due to the assumptions of infinite population and simple random sample the errors will generally be underestimated by GLM and similar.

All Replies

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Posted in reply to mdvogan

01-20-2015 03:22 PM

No weight statements?

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Posted in reply to Reeza

01-20-2015 03:29 PM

I didn't include weight statements in proc surveyreg since the original proc glm regression was unweighted. Must proc surveyreg have weights?

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Posted in reply to mdvogan

01-20-2015 03:55 PM

No, but its typically intended to be used with weighted or stratified data.

In general from what I can find it does appear that they use different methods for variance estimations.

Hopefully someone smarter will comment

Solution

01-20-2015
04:18 PM

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Posted in reply to mdvogan

01-20-2015 04:18 PM

From the online help:

**Many SAS/STAT procedures, such as the MEANS, FREQ, GLM, LOGISTIC, and PHREG procedures, can compute sample means, produce crosstabulation tables, and estimate regression relationships. However, in most of these procedures, statistical inference is based on the assumption that the sample is drawn from an infinite population by simple random sampling. If the sample is in fact selected from a finite population by using a complex survey design, these procedures generally do not calculate the estimates and their variances according to the design actually used. Using analyses that are not appropriate for your sample design can lead to incorrect statistical inferences**.

Due to the assumptions of infinite population and simple random sample the errors will generally be underestimated by GLM and similar.