BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
Denali
Quartz | Level 8

How do I randomly split a dataset with1399 unique observations into 2 datasets with 1000 vs. 399 obs?

1 ACCEPTED SOLUTION

Accepted Solutions
Reeza
Super User

PROC SURVEYSELECT is typically used for sample selection. 

In your case, because you have two groups that are mutually exclusive, you can use the OUTALL option to output all records. 

 

So you select a sample of 1000, but all records are outputted with a variable called SELECTED that will indicate if a variable is in the sample. 

 

Illustrated here, using SASHELP.STOCKS, with a sample of 300 and the remainder, 399 in the second group. No sorting required, proc contents and proc freq are for illustrative purposes only. 

 

You will want to set a SEED so that your sample is reproducible, ie if you run the exact same data through it again with the same seed it will generate the same sample.

 

proc contents data=sashelp.stocks;
run;


proc surveyselect data=sashelp.stocks method=srs sampsize=300 out=sample_selected outall seed=50;
run;

proc freq data=sample_selected;
table selected;
run;

@Denali wrote:

How do I randomly split a dataset with1399 unique observations into 2 datasets with 1000 vs. 399 obs?


 

View solution in original post

3 REPLIES 3
yabwon
Onyx | Level 15
data have;
  do ID = 1 to 1399;
    output;
  end;
run;


data r1;
  set have;
  call streaminit(42);
  r = rand('uniform');
run;
proc sort data = r1;
  by r;
run;

data want1000 want399;
  set r1;
  drop r;

  if 1000 => _N_ then output want1000;
                 else output want399;
run;
_______________
Polish SAS Users Group: www.polsug.com and communities.sas.com/polsug

"SAS Packages: the way to share" at SGF2020 Proceedings (the latest version), GitHub Repository, and YouTube Video.
Hands-on-Workshop: "Share your code with SAS Packages"
"My First SAS Package: A How-To" at SGF2021 Proceedings

SAS Ballot Ideas: one: SPF in SAS, two, and three
SAS Documentation



yabwon
Onyx | Level 15

one more without sorting.

data have;
  do ID = 1 to 1399;
    output;
  end;
run;


data want1000 want399;

  call streaminit(42);
  
  declare hash H();
  H.defineKey("curobs");
  H.defineDone();

  do while(H.num_items<399);
    curobs = rand('integer', 1, 1399);
    H.replace(); 
  end;

  do until(eof);
    set have end=eof curobs=curobs;
    if H.check() then output want1000;
                 else output want399;
  end;
stop; run; /* test */ proc sql; select * from want1000 intersect select * from want399 ; quit;
_______________
Polish SAS Users Group: www.polsug.com and communities.sas.com/polsug

"SAS Packages: the way to share" at SGF2020 Proceedings (the latest version), GitHub Repository, and YouTube Video.
Hands-on-Workshop: "Share your code with SAS Packages"
"My First SAS Package: A How-To" at SGF2021 Proceedings

SAS Ballot Ideas: one: SPF in SAS, two, and three
SAS Documentation



Reeza
Super User

PROC SURVEYSELECT is typically used for sample selection. 

In your case, because you have two groups that are mutually exclusive, you can use the OUTALL option to output all records. 

 

So you select a sample of 1000, but all records are outputted with a variable called SELECTED that will indicate if a variable is in the sample. 

 

Illustrated here, using SASHELP.STOCKS, with a sample of 300 and the remainder, 399 in the second group. No sorting required, proc contents and proc freq are for illustrative purposes only. 

 

You will want to set a SEED so that your sample is reproducible, ie if you run the exact same data through it again with the same seed it will generate the same sample.

 

proc contents data=sashelp.stocks;
run;


proc surveyselect data=sashelp.stocks method=srs sampsize=300 out=sample_selected outall seed=50;
run;

proc freq data=sample_selected;
table selected;
run;

@Denali wrote:

How do I randomly split a dataset with1399 unique observations into 2 datasets with 1000 vs. 399 obs?


 

sas-innovate-2024.png

Don't miss out on SAS Innovate - Register now for the FREE Livestream!

Can't make it to Vegas? No problem! Watch our general sessions LIVE or on-demand starting April 17th. Hear from SAS execs, best-selling author Adam Grant, Hot Ones host Sean Evans, top tech journalist Kara Swisher, AI expert Cassie Kozyrkov, and the mind-blowing dance crew iLuminate! Plus, get access to over 20 breakout sessions.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 3 replies
  • 685 views
  • 2 likes
  • 3 in conversation