BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
pkfamily
Obsidian | Level 7

Hi,

 

I was wondering if it was normal to get different results when you change the order of the strata variables. For example, let's say we run two models:

 

proc surveyreg data=mydata; strata A B C; model yvar=xvar;run;

proc surveyreg data=mydata; strata B A C; model yvar=xvar;run;

 

Is it possible for the two models to get different results? The reason I'm asking is because for the model I ran, by simply switching the order of the strata variables, my standard errors and p-values changed. This was not universal however. Results only changed when using a specific xvar.

 

Thanks.

1 ACCEPTED SOLUTION

Accepted Solutions
ballardw
Super User

From the documentation:

The STRATA statement names one or more variables that identify the first-stage strata in a stratified sample design. The combinations of levels of STRATA variables define the strata in the sample, where strata are nonoverlapping subgroups that were sampled independently.

 

So the order reflects the sampling strategy. If you change the strata order then you are saying that the sample strategy changed and the results will be (sometimes quite) significantly different. A strata A B C says that we identified records by some characteristic and sampled them, then with in A we sampled by characteristic B, then within B we sampled on C.

 

Your strata are fixed at sampling. Models should reflect that sample.

 

And you might show the entire Proc Surveyfreq as CLUSTER statement also affects calculations.

View solution in original post

2 REPLIES 2
ballardw
Super User

From the documentation:

The STRATA statement names one or more variables that identify the first-stage strata in a stratified sample design. The combinations of levels of STRATA variables define the strata in the sample, where strata are nonoverlapping subgroups that were sampled independently.

 

So the order reflects the sampling strategy. If you change the strata order then you are saying that the sample strategy changed and the results will be (sometimes quite) significantly different. A strata A B C says that we identified records by some characteristic and sampled them, then with in A we sampled by characteristic B, then within B we sampled on C.

 

Your strata are fixed at sampling. Models should reflect that sample.

 

And you might show the entire Proc Surveyfreq as CLUSTER statement also affects calculations.

Rick_SAS
SAS Super FREQ

In addition to ballardw's comments, I will point out that you need to use

PROC SURVEYSELECT SEED=12345 ...

if you want two runs of the procedure to produce the same results. Since you are using the STRATA statement, you will also need to control the stratum random seeds. See the SEED= option in the doc. 

sas-innovate-2024.png

Don't miss out on SAS Innovate - Register now for the FREE Livestream!

Can't make it to Vegas? No problem! Watch our general sessions LIVE or on-demand starting April 17th. Hear from SAS execs, best-selling author Adam Grant, Hot Ones host Sean Evans, top tech journalist Kara Swisher, AI expert Cassie Kozyrkov, and the mind-blowing dance crew iLuminate! Plus, get access to over 20 breakout sessions.

 

Register now!

What is ANOVA?

ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 2 replies
  • 1446 views
  • 3 likes
  • 3 in conversation