BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
mdvogan
Calcite | Level 5


Hello All,

     I am attempting to cluster standard errors by state using proc surveyreg for an OLS regression that I previously ran using proc glm.

     In doing so, I found that without clustering, the coefficient estimates between the two procedures are the same but the standard errors are different (the standard errors from proc surveyreg are higher than proc glm). Does anyone know why this is the case? I want to make sure that the difference between the clustered standard errors from proc surveyreg and the standard errors from proc glm is only due to clustering and not also attributable to something going on behind the scenes in proc surveyreg that I am unaware of. I could not find anything that expains how proc surveyreg calculates standard errors the users guide. 

     Below is example code illustrating how I ran the procedures. Am I wrong to believe that proc surveyreg without the cluster option is the same as plain OLS?

proc glm data = data;

     class categrical_vars(ref = reference_level);

     model = model/ solution;

     where some_condition;

run;

proc surveyreg data = data order = formatted;

     format reference_format;

     class categorical_vars;

     model = model/ solution;

     where some_condition;

run;

     Thanks in advance for your help!

1 ACCEPTED SOLUTION

Accepted Solutions
ballardw
Super User

From the online help:

Many SAS/STAT procedures, such as the MEANS, FREQ, GLM, LOGISTIC, and PHREG procedures, can compute sample means, produce crosstabulation tables, and estimate regression relationships. However, in most of these procedures, statistical inference is based on the assumption that the sample is drawn from an infinite population by simple random sampling. If the sample is in fact selected from a finite population by using a complex survey design, these procedures generally do not calculate the estimates and their variances according to the design actually used. Using analyses that are not appropriate for your sample  design can lead to incorrect statistical inferences.

Due to the assumptions of infinite population and simple random sample the errors will generally be underestimated by GLM and similar.

View solution in original post

4 REPLIES 4
Reeza
Super User

No weight statements?

mdvogan
Calcite | Level 5

I didn't include weight statements in proc surveyreg since the original proc glm regression was unweighted. Must proc surveyreg have weights?

Reeza
Super User

No, but its typically intended to be used with weighted or stratified data.

In general from what I can find it does appear that they use different methods for variance estimations. 

Hopefully someone smarter will comment Smiley Happy

ballardw
Super User

From the online help:

Many SAS/STAT procedures, such as the MEANS, FREQ, GLM, LOGISTIC, and PHREG procedures, can compute sample means, produce crosstabulation tables, and estimate regression relationships. However, in most of these procedures, statistical inference is based on the assumption that the sample is drawn from an infinite population by simple random sampling. If the sample is in fact selected from a finite population by using a complex survey design, these procedures generally do not calculate the estimates and their variances according to the design actually used. Using analyses that are not appropriate for your sample  design can lead to incorrect statistical inferences.

Due to the assumptions of infinite population and simple random sample the errors will generally be underestimated by GLM and similar.

sas-innovate-2024.png

Don't miss out on SAS Innovate - Register now for the FREE Livestream!

Can't make it to Vegas? No problem! Watch our general sessions LIVE or on-demand starting April 17th. Hear from SAS execs, best-selling author Adam Grant, Hot Ones host Sean Evans, top tech journalist Kara Swisher, AI expert Cassie Kozyrkov, and the mind-blowing dance crew iLuminate! Plus, get access to over 20 breakout sessions.

 

Register now!

What is ANOVA?

ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 4 replies
  • 2995 views
  • 0 likes
  • 3 in conversation