I would like to split a given dataset into n stratified equal sized-ish folds by amending it with an additional column containing n. What is a common/simple way to achieve this? Thanks.
Generally, it's not a good idea.
That being said, here's two write ups on it.
1. http://www.sascommunity.org/wiki/Split_Data_into_Subsets
2. https://blogs.sas.com/content/sasdummy/2015/01/26/how-to-split-one-data-set-into-many/
PROC SURVEYSELECT then? Choose N samples of X data?
It won't create multiple data sets but will do the random selection. Then you can use the methods above to split.
Or the manual way of adding a random numbers, sort by random number and use any of the methods in the link above.
You should have enough information and samples here to write the sample code yourself or at minimum provide sample data
No guarantee about randomish result but
data want;
set have;
split = mod(_n_, 9);
run;
will add a variable that will split the data set into in 9 parts and the size difference will be plus/minus 1 between any groups. Replace 9 with your desired number.
If randomization critical then add a variable the result of a random number function, sort by that variable and then use the method above.
data have;
set sashelp.heart;
call streaminit(12345678);
random=rand('uniform');
run;
proc rank data=have out=temp groups=3;
var random;
ranks group;
run;
data want1 want2 want3;
set temp;
if group=0 then output want1;
else if group=1 then output want2;
else output want3;
run;
Using call rantbl, with regular updating of table probabilities will allow a single-step solution, creating the new variable subgroup.
data want (drop=_:);
set have nobs=nrecs;
array needed{10} _temporary_;
array needprob{10} _temporary_;
if _n_=1 then do;
do _I=1 to dim(needed);
needed{_I}=floor(nrecs/dim(needed));
end;
do _I=1 to dim(needed) while (sum(of needed{*})<nrecs);
needed{_I}=needed{_I}+1;
end;
end;
_nleft = nrecs-(_n_-1);
do _I=1 to dim(needed);
needprob{_I}=needed{_I}/_nleft;
end;
seed=1250666;
call rantbl(_seed,of needprob{*},subgroup);
needed{subgroup}=needed{subgroup}-1;
run;
Notes:
Don't miss out on SAS Innovate - Register now for the FREE Livestream!
Can't make it to Vegas? No problem! Watch our general sessions LIVE or on-demand starting April 17th. Hear from SAS execs, best-selling author Adam Grant, Hot Ones host Sean Evans, top tech journalist Kara Swisher, AI expert Cassie Kozyrkov, and the mind-blowing dance crew iLuminate! Plus, get access to over 20 breakout sessions.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.