I'm using SAS to do machine learning. I would like to randomly split my data into 60% training, 20% validation, and 20% test data sets. How do I do that in SAS?
There are many ways. Here is one:
data training validation test; set sashelp.class; _n_=rand('uniform'); if _n_ le .6 then output training; else if _n_ le .8 then output validation; else output test; run;
Art, CEO, AnalystFinder.com
There are many ways. Here is one:
data training validation test; set sashelp.class; _n_=rand('uniform'); if _n_ le .6 then output training; else if _n_ le .8 then output validation; else output test; run;
Art, CEO, AnalystFinder.com
Are you using SAS EM? If so, check the Partition task.
No, i'am using SAS only
Or PROC SURVEYSELECT. %let dsid=%sysfunc(open(sashelp.class)); %let nobs=%sysfunc(attrn(&dsid,nlobs)); %let dsid=%sysfunc(close(&dsid)); %let train=%sysevalf(0.6*&nobs,int); %let valid=%sysevalf(0.2*&nobs,int); %let test=%eval(&nobs-&train-&valid); %put &train &valid &test; proc surveyselect data=sashelp.class group=(&train &valid &test) out=want; run; data train valid test; set want; select(groupid); when(1) output train; when(2) output valid; when(3) output test; otherwise; end; run;
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.