BookmarkSubscribeRSS Feed
mahler_ji
Obsidian | Level 7

Hello All,

Looking for a quick way to randomly partition a dataset into three different subsets for model training, testing and validation.   I would like to be able to vary the sizes of each set (50% test, 30% training, etc etc) and make sure that the sets are randomly generated.

Thanks!

John

3 REPLIES 3
Ksharp
Super User

Or you should check proc surveyselect ;

data shoes;
 set sashelp.shoes;
 r=ranuni(-1);
run;
proc rank data=shoes out=have groups=100;
 var r;
 ranks rank;
run;
data test training valid;
 set have;
 select;
  when(rank lt 50) output test;
  when(rank lt 80) output training;
  otherwise output valid;
 end;
run;



Xia Keshan

slchen
Lapis Lazuli | Level 10

also try to use proc surveryselect

stat_sas
Ammonite | Level 13

data test training valid;

set have;

if ranuni(2345)<=0.5 then output test;

else if ranuni(2345)>0.5 and ranuni(2345)<=0.8 then output training;

else output valid;   

run;

sas-innovate-2024.png

Don't miss out on SAS Innovate - Register now for the FREE Livestream!

Can't make it to Vegas? No problem! Watch our general sessions LIVE or on-demand starting April 17th. Hear from SAS execs, best-selling author Adam Grant, Hot Ones host Sean Evans, top tech journalist Kara Swisher, AI expert Cassie Kozyrkov, and the mind-blowing dance crew iLuminate! Plus, get access to over 20 breakout sessions.

 

Register now!

What is Bayesian Analysis?

Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 3 replies
  • 793 views
  • 0 likes
  • 4 in conversation