How to split a data frame into 60% training, 20% validation, and 20% test sets?

Accepted Solution Solved
Reply
Contributor
Posts: 36
Accepted Solution

How to split a data frame into 60% training, 20% validation, and 20% test sets?

I'm using SAS to do machine learning. I would like to randomly split my data into 60% training, 20% validation, and 20% test data sets. How do I do that in SAS?


Accepted Solutions
Solution
‎03-18-2017 09:59 PM
PROC Star
Posts: 7,468

Re: How to split a data frame into 60% training, 20% validation, and 20% test sets?

There are many ways. Here is one:

 

data training validation test;
  set sashelp.class;
  _n_=rand('uniform');
  if _n_ le .6 then output training;
  else if _n_ le .8 then output validation;
  else output test;
run;

Art, CEO, AnalystFinder.com

View solution in original post


All Replies
Solution
‎03-18-2017 09:59 PM
PROC Star
Posts: 7,468

Re: How to split a data frame into 60% training, 20% validation, and 20% test sets?

There are many ways. Here is one:

 

data training validation test;
  set sashelp.class;
  _n_=rand('uniform');
  if _n_ le .6 then output training;
  else if _n_ le .8 then output validation;
  else output test;
run;

Art, CEO, AnalystFinder.com

Super User
Posts: 19,780

Re: How to split a data frame into 60% training, 20% validation, and 20% test sets?

Are you using SAS EM? If so, check the Partition task. 

Contributor
Posts: 36

Re: How to split a data frame into 60% training, 20% validation, and 20% test sets?

No, i'am using SAS only

Super User
Posts: 10,023

Re: How to split a data frame into 60% training, 20% validation, and 20% test sets?

Or PROC SURVEYSELECT.


%let dsid=%sysfunc(open(sashelp.class));
%let nobs=%sysfunc(attrn(&dsid,nlobs));
%let dsid=%sysfunc(close(&dsid));

%let train=%sysevalf(0.6*&nobs,int);
%let valid=%sysevalf(0.2*&nobs,int);
%let test=%eval(&nobs-&train-&valid);

%put &train &valid &test;

proc surveyselect data=sashelp.class group=(&train &valid &test) out=want;
run;

data train valid test;
 set want;
 select(groupid);
 when(1) output train;
 when(2) output valid;
 when(3) output test;
 otherwise;
 end;
run;

☑ This topic is solved.

Need further help from the community? Please ask a new question.

Discussion stats
  • 4 replies
  • 332 views
  • 3 likes
  • 4 in conversation