Hi,
I was wondering if it was normal to get different results when you change the order of the strata variables. For example, let's say we run two models:
proc surveyreg data=mydata; strata A B C; model yvar=xvar;run;
proc surveyreg data=mydata; strata B A C; model yvar=xvar;run;
Is it possible for the two models to get different results? The reason I'm asking is because for the model I ran, by simply switching the order of the strata variables, my standard errors and p-values changed. This was not universal however. Results only changed when using a specific xvar.
Thanks.
From the documentation:
The STRATA statement names one or more variables that identify the first-stage strata in a stratified sample design. The combinations of levels of STRATA variables define the strata in the sample, where strata are nonoverlapping subgroups that were sampled independently.
So the order reflects the sampling strategy. If you change the strata order then you are saying that the sample strategy changed and the results will be (sometimes quite) significantly different. A strata A B C says that we identified records by some characteristic and sampled them, then with in A we sampled by characteristic B, then within B we sampled on C.
Your strata are fixed at sampling. Models should reflect that sample.
And you might show the entire Proc Surveyfreq as CLUSTER statement also affects calculations.
From the documentation:
The STRATA statement names one or more variables that identify the first-stage strata in a stratified sample design. The combinations of levels of STRATA variables define the strata in the sample, where strata are nonoverlapping subgroups that were sampled independently.
So the order reflects the sampling strategy. If you change the strata order then you are saying that the sample strategy changed and the results will be (sometimes quite) significantly different. A strata A B C says that we identified records by some characteristic and sampled them, then with in A we sampled by characteristic B, then within B we sampled on C.
Your strata are fixed at sampling. Models should reflect that sample.
And you might show the entire Proc Surveyfreq as CLUSTER statement also affects calculations.
In addition to ballardw's comments, I will point out that you need to use
PROC SURVEYSELECT SEED=12345 ...
if you want two runs of the procedure to produce the same results. Since you are using the STRATA statement, you will also need to control the stratum random seeds. See the SEED= option in the doc.
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.
Find more tutorials on the SAS Users YouTube channel.