Brilliant! Thank you*840 to Astounding.
So putting it all together we have the following. Perhaps there is a more efficient program, but I am happy with this.
* create a new variable identifying the unique strata;
data all_combinations;
do a=1 to 2;
do b=1 to 12;
do c=1 to 5;
do d=1 to 7;
single_variable = d + 10*c + 100*b + 10000 * a;
output;
end; end; end; end;
run;
* now merge that back to the main dataset (test1);
proc sort data=test1;
by a b c d;
run;
proc sort data=all_combinations;
by a b c d;
run;
data merge1;
merge test1 all_combinations;
by a b c d;
run;
* create the dummy codes by strata for the variables (ms3c and edu2c);
data test2 (keep= single_variable ms2 ms3 edu2);
set merge1;
if ms3c=2 then ms2=1; else ms2=0;
if ms3c=3 then ms3=1; else ms3=0;
if edu2c=2 then edu2=1; else edu2=0;
run;
* now create a dummy code matrix table for each unique strata (840) and name the tables with the strata id code (single_variable);
data _null_;
call execute('proc sql') ;
do until (done);
set test2 end=done;
dsname = put(single_variable, z5.);
call execute('create table xmat_' || dsname || ' as select * from test4 where single_variable= ' || dsname || ';' ) ;
end;
call execute('quit;' ) ;
stop;
run;
I feel like we're missing something giant here. Are you then exporting the data out to do the regressions in some other tool? Otherwise, why split the files in the first place or create the single indicator, SAS will handle all of that directly into a BY.
Same thing with the export, you don't need to precalculate it, just sort your data by A/B/C/D and then create the files dynamically using the data step as illustrated.
thinking about it more, I think you might be onto something - I should be able to do a one-to-many merge of the unique 'single_variable' back into my main dataset matching on the 4 strata variables, correct?
Couldn't you just put them into the BY statement or does that mess up something else?
Note that STRATA is different than BY so if you're including them in the STRATA this isn't possible.
As long as you have data for each stratum SAS will process it appropriately.
@acerickson wrote:
thanks for the response, that a helpful start. I can use different variable names, that's no problem.
apologies for the unclear description.
There are 4 stratifying variables with the following ranges:
a (1,2)
b (1,12)
c (1,5)
d (1,7)
so what I mean by the "..and so on..." comment, is that I would need to repeat those 3 IF statements 840 times in order to cycle through all possible combinations of the 4 stratifying variables.
thanks again!
thanks again for your time and effort on this, I'm not really making it easier for you folks.
thinking about it more, I think Astounding might be onto something - I should be able to do a one-to-many merge of the unique 'single_variable' back into my main dataset matching on the 4 strata variables, correct? This could solve a couple issues, like ensuring at least one event (death) occurs in each strata, and if there are any unpopulated strata.
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.