03-11-2018 08:59 AM - edited 03-11-2018 09:04 AM
my goal is to create training and testing subsets that are representative of the original data set.
For example, the original data have is made up by 80% of data from the US (region="US") and by 20% of data from Asia (region="Asia").
data have(drop=i); do i=1 to 10; if i>8 then region = "Asia"; else region = "US"; output; end; run;
I now want to randomly split have in 2 subsets each also being made up by 80% of data from the US and by 20% of data from Asia.
Solution so far
Apparently, there is a comprehensive SAS procedure called SURVEYSELECT to handle this task. This is the best solution I was able to come up with to get the job done (splitting have into want1 and want2; to keep things simple, a splitting ratio of 50% was applied):
/* BEFORE surveyselect... */ /* (1) we need number of obs */ %let dsid = %sysfunc(open(have)); %let nobs = %sysfunc(attrn(&dsid, nobs)); %let close = %sysfunc(close(&dsid.)); /* (2) we need to sort the data */ proc sql noprint; create view haveV as select * from have order by region; quit; proc surveyselect noprint data = haveV out = have2 outall sampsize = %sysevalf(&nobs.*.5) seed = 100 ; strata region / alloc = prop; run; /* AFTER surveyselect... */ /* ...we need to split the data set ourselves */ data want1 want2; set have2; if Selected then output want1; else output want2; run;
Is there a better approach (or a better way to use SURVEYSELECT) to get the splitting job done?
Considering that this splitting job is a frequently occuring task that can be accomplished with a few lines of code in other languages, there should be a better solution in SAS that doesn't suffer from the following short comings of the solution I found:
Thank you very much!
03-14-2018 05:21 PM
I forwarded your post to my colleagues. Several replies stated that you can use the SAMPRATE in PROC SURVEYSELECT. Another suggested if you have access to High Performance procedures to try PROC HPSAMPLE. And the final suggestion was this blog:
I hope this helps,