I'm using SAS to do machine learning. I would like to randomly split my data into 60% training, 20% validation, and 20% test data sets. How do I do that in SAS?
There are many ways. Here is one:
data training validation test; set sashelp.class; _n_=rand('uniform'); if _n_ le .6 then output training; else if _n_ le .8 then output validation; else output test; run;
Art, CEO, AnalystFinder.com
There are many ways. Here is one:
data training validation test; set sashelp.class; _n_=rand('uniform'); if _n_ le .6 then output training; else if _n_ le .8 then output validation; else output test; run;
Art, CEO, AnalystFinder.com
Are you using SAS EM? If so, check the Partition task.
No, i'am using SAS only
Or PROC SURVEYSELECT. %let dsid=%sysfunc(open(sashelp.class)); %let nobs=%sysfunc(attrn(&dsid,nlobs)); %let dsid=%sysfunc(close(&dsid)); %let train=%sysevalf(0.6*&nobs,int); %let valid=%sysevalf(0.2*&nobs,int); %let test=%eval(&nobs-&train-&valid); %put &train &valid &test; proc surveyselect data=sashelp.class group=(&train &valid &test) out=want; run; data train valid test; set want; select(groupid); when(1) output train; when(2) output valid; when(3) output test; otherwise; end; run;
Available on demand!
Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.