Solved: Proc GLM vs Proc Surveyreg Standard Errors

mdvogan · Posted 01-20-2015 03:13 PM

Hello All,

I am attempting to cluster standard errors by state using proc surveyreg for an OLS regression that I previously ran using proc glm.

In doing so, I found that without clustering, the coefficient estimates between the two procedures are the same but the standard errors are different (the standard errors from proc surveyreg are higher than proc glm). Does anyone know why this is the case? I want to make sure that the difference between the clustered standard errors from proc surveyreg and the standard errors from proc glm is only due to clustering and not also attributable to something going on behind the scenes in proc surveyreg that I am unaware of. I could not find anything that expains how proc surveyreg calculates standard errors the users guide.

Below is example code illustrating how I ran the procedures. Am I wrong to believe that proc surveyreg without the cluster option is the same as plain OLS?

proc glm data = data;

class categrical_vars(ref = reference_level);

model = model/ solution;

where some_condition;

run;

proc surveyreg data = data order = formatted;

format reference_format;

class categorical_vars;

model = model/ solution;

where some_condition;

run;

Thanks in advance for your help!

ballardw · Posted 01-20-2015 04:18 PM

From the online help:

Many SAS/STAT procedures, such as the MEANS, FREQ, GLM, LOGISTIC, and PHREG procedures, can compute sample means, produce crosstabulation tables, and estimate regression relationships. However, in most of these procedures, statistical inference is based on the assumption that the sample is drawn from an infinite population by simple random sampling. If the sample is in fact selected from a finite population by using a complex survey design, these procedures generally do not calculate the estimates and their variances according to the design actually used. Using analyses that are not appropriate for your sample design can lead to incorrect statistical inferences.

Due to the assumptions of infinite population and simple random sample the errors will generally be underestimated by GLM and similar.

View solution in original post

Reeza · Posted 01-20-2015 03:22 PM

No weight statements?

mdvogan · Posted 01-20-2015 03:29 PM

I didn't include weight statements in proc surveyreg since the original proc glm regression was unweighted. Must proc surveyreg have weights?

Reeza · Posted 01-20-2015 03:55 PM

No, but its typically intended to be used with weighted or stratified data.

In general from what I can find it does appear that they use different methods for variance estimations.

Hopefully someone smarter will comment

ballardw · Posted 01-20-2015 04:18 PM

From the online help:

Many SAS/STAT procedures, such as the MEANS, FREQ, GLM, LOGISTIC, and PHREG procedures, can compute sample means, produce crosstabulation tables, and estimate regression relationships. However, in most of these procedures, statistical inference is based on the assumption that the sample is drawn from an infinite population by simple random sampling. If the sample is in fact selected from a finite population by using a complex survey design, these procedures generally do not calculate the estimates and their variances according to the design actually used. Using analyses that are not appropriate for your sample design can lead to incorrect statistical inferences.

Due to the assumptions of infinite population and simple random sample the errors will generally be underestimated by GLM and similar.

Proc GLM vs Proc Surveyreg Standard Errors

Re: Proc GLM vs Proc Surveyreg Standard Errors

Re: Proc GLM vs Proc Surveyreg Standard Errors

Re: Proc GLM vs Proc Surveyreg Standard Errors

Re: Proc GLM vs Proc Surveyreg Standard Errors

Re: Proc GLM vs Proc Surveyreg Standard Errors