BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
pkfamily
Obsidian | Level 7

Hi,

 

I was wondering if it was normal to get different results when you change the order of the strata variables. For example, let's say we run two models:

 

proc surveyreg data=mydata; strata A B C; model yvar=xvar;run;

proc surveyreg data=mydata; strata B A C; model yvar=xvar;run;

 

Is it possible for the two models to get different results? The reason I'm asking is because for the model I ran, by simply switching the order of the strata variables, my standard errors and p-values changed. This was not universal however. Results only changed when using a specific xvar.

 

Thanks.

1 ACCEPTED SOLUTION

Accepted Solutions
ballardw
Super User

From the documentation:

The STRATA statement names one or more variables that identify the first-stage strata in a stratified sample design. The combinations of levels of STRATA variables define the strata in the sample, where strata are nonoverlapping subgroups that were sampled independently.

 

So the order reflects the sampling strategy. If you change the strata order then you are saying that the sample strategy changed and the results will be (sometimes quite) significantly different. A strata A B C says that we identified records by some characteristic and sampled them, then with in A we sampled by characteristic B, then within B we sampled on C.

 

Your strata are fixed at sampling. Models should reflect that sample.

 

And you might show the entire Proc Surveyfreq as CLUSTER statement also affects calculations.

View solution in original post

2 REPLIES 2
ballardw
Super User

From the documentation:

The STRATA statement names one or more variables that identify the first-stage strata in a stratified sample design. The combinations of levels of STRATA variables define the strata in the sample, where strata are nonoverlapping subgroups that were sampled independently.

 

So the order reflects the sampling strategy. If you change the strata order then you are saying that the sample strategy changed and the results will be (sometimes quite) significantly different. A strata A B C says that we identified records by some characteristic and sampled them, then with in A we sampled by characteristic B, then within B we sampled on C.

 

Your strata are fixed at sampling. Models should reflect that sample.

 

And you might show the entire Proc Surveyfreq as CLUSTER statement also affects calculations.

Rick_SAS
SAS Super FREQ

In addition to ballardw's comments, I will point out that you need to use

PROC SURVEYSELECT SEED=12345 ...

if you want two runs of the procedure to produce the same results. Since you are using the STRATA statement, you will also need to control the stratum random seeds. See the SEED= option in the doc. 

SAS Innovate 2025: Call for Content

Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!

Submit your idea!

What is ANOVA?

ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 2 replies
  • 1634 views
  • 3 likes
  • 3 in conversation