I'm using SAS to do machine learning. I would like to randomly split my data into 60% training, 20% validation, and 20% test data sets. How do I do that in SAS?
There are many ways. Here is one:
data training validation test; set sashelp.class; _n_=rand('uniform'); if _n_ le .6 then output training; else if _n_ le .8 then output validation; else output test; run;
Art, CEO, AnalystFinder.com
There are many ways. Here is one:
data training validation test; set sashelp.class; _n_=rand('uniform'); if _n_ le .6 then output training; else if _n_ le .8 then output validation; else output test; run;
Art, CEO, AnalystFinder.com
Are you using SAS EM? If so, check the Partition task.
No, i'am using SAS only
Or PROC SURVEYSELECT. %let dsid=%sysfunc(open(sashelp.class)); %let nobs=%sysfunc(attrn(&dsid,nlobs)); %let dsid=%sysfunc(close(&dsid)); %let train=%sysevalf(0.6*&nobs,int); %let valid=%sysevalf(0.2*&nobs,int); %let test=%eval(&nobs-&train-&valid); %put &train &valid &test; proc surveyselect data=sashelp.class group=(&train &valid &test) out=want; run; data train valid test; set want; select(groupid); when(1) output train; when(2) output valid; when(3) output test; otherwise; end; run;
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.