I'm using SAS to do machine learning. I would like to randomly split my data into 60% training, 20% validation, and 20% test data sets. How do I do that in SAS?
There are many ways. Here is one:
data training validation test;
set sashelp.class;
_n_=rand('uniform');
if _n_ le .6 then output training;
else if _n_ le .8 then output validation;
else output test;
run;
Art, CEO, AnalystFinder.com
There are many ways. Here is one:
data training validation test;
set sashelp.class;
_n_=rand('uniform');
if _n_ le .6 then output training;
else if _n_ le .8 then output validation;
else output test;
run;
Art, CEO, AnalystFinder.com
Are you using SAS EM? If so, check the Partition task.
No, i'am using SAS only
Or PROC SURVEYSELECT. %let dsid=%sysfunc(open(sashelp.class)); %let nobs=%sysfunc(attrn(&dsid,nlobs)); %let dsid=%sysfunc(close(&dsid)); %let train=%sysevalf(0.6*&nobs,int); %let valid=%sysevalf(0.2*&nobs,int); %let test=%eval(&nobs-&train-&valid); %put &train &valid &test; proc surveyselect data=sashelp.class group=(&train &valid &test) out=want; run; data train valid test; set want; select(groupid); when(1) output train; when(2) output valid; when(3) output test; otherwise; end; run;
April 27 – 30 | Gaylord Texan | Grapevine, Texas
Walk in ready to learn. Walk out ready to deliver. This is the data and AI conference you can't afford to miss.
Register now and lock in 2025 pricing—just $495!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.