I'm using SAS to do machine learning. I would like to randomly split my data into 60% training, 20% validation, and 20% test data sets. How do I do that in SAS?
There are many ways. Here is one:
data training validation test; set sashelp.class; _n_=rand('uniform'); if _n_ le .6 then output training; else if _n_ le .8 then output validation; else output test; run;
Art, CEO, AnalystFinder.com
There are many ways. Here is one:
data training validation test; set sashelp.class; _n_=rand('uniform'); if _n_ le .6 then output training; else if _n_ le .8 then output validation; else output test; run;
Art, CEO, AnalystFinder.com
Are you using SAS EM? If so, check the Partition task.
No, i'am using SAS only
Or PROC SURVEYSELECT. %let dsid=%sysfunc(open(sashelp.class)); %let nobs=%sysfunc(attrn(&dsid,nlobs)); %let dsid=%sysfunc(close(&dsid)); %let train=%sysevalf(0.6*&nobs,int); %let valid=%sysevalf(0.2*&nobs,int); %let test=%eval(&nobs-&train-&valid); %put &train &valid &test; proc surveyselect data=sashelp.class group=(&train &valid &test) out=want; run; data train valid test; set want; select(groupid); when(1) output train; when(2) output valid; when(3) output test; otherwise; end; run;
It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.