BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
SWEETSAS
Obsidian | Level 7

Input id $ x y;

 

Datalines;

00-01 5 3

00-02 4 6

00-03 6 4

00-04 7 5

00-05 9 2

00-06 4 6

00-07 5 8

00-08 1 8

00-09 7 3

00-10  7 4

;

run;

Please, I want to generate Z random subset of data set CO chosen observation 4 at a time (10 chose 4). Let’s take Z in this case=7; I like to add additional column to identify each of the random subset—let call that variable REPS. I will appreciate a code that can run fast because the actual code might end up having up to 10,000 REPS. And number of variables can be larger.

 

I read Rick's blog and saw some examples, but I did not see an example where the combination involve actual data set. If it's not possible for more than a variable, I can do only the id (character variable) and merge back to the original data set.

 

Any help will be appreciated!

 

Thanks in Advance

Jack

1 ACCEPTED SOLUTION

Accepted Solutions
Ksharp
Super User

Does PG's code make some sense ?

it IS possible for more than a variable for IML, But need some more code , and might not be friendly . You really want IML code ?

 

data have;
Input id $ x y;
Datalines;
00-01 5 3
00-02 4 6
00-03 6 4
00-04 7 5
00-05 9 2
00-06 4 6
00-07 5 8
00-08 1 8
00-09 7 3
00-10  7 4
;
run;
proc iml;
z=7;
k=4;

use have nobs nobs;
read all var{id x y};
close;
r=rancomb(nobs,k,z);



 new_id=j(k,1,blankstr(nleng(id)));
create want var{reps new_id new_x new_y};
do i=1 to nrow(r);
 idx=r[i,];
 reps=j(k,1,i);
 new_id=id[idx];
 new_x=x[idx];
 new_y=y[idx];
 append;
end;
close;

quit;

 

View solution in original post

12 REPLIES 12
Reeza
Super User

Proc surveyselect will work. 

Reeza
Super User

Selection with or without replacement?

SWEETSAS
Obsidian | Level 7

It's basicacally to obtain a random subset of possible permutations of the data set. The reason is that in some settings, computer memory might limit the enumeration of all permutations. Therefor, a sufficiently large number of permutaion of the data set should suffice.

 

Thanks!

J

PGStats
Opal | Level 21

As @Reeza suggested:

 

data CO;
Input id $ x y;
Datalines;
00-01 5 3
00-02 4 6
00-03 6 4
00-04 7 5
00-05 9 2
00-06 4 6
00-07 5 8
00-08 1 8
00-09 7 3
00-10  7 4
;

proc surveyselect data=CO 
    method=srs /* Simple Random sampling without replacement */
    /*method=urs outhits*/ /* Simple Random sampling with replacement */
    reps=10 sampsize=4 
    out=COsamples seed=896358; 
run;

proc print data=COsamples noobs; run;
PG
SWEETSAS
Obsidian | Level 7

Incredible!!! 

 

Let me evalute to this. 

Ksharp
Super User

Does PG's code make some sense ?

it IS possible for more than a variable for IML, But need some more code , and might not be friendly . You really want IML code ?

 

data have;
Input id $ x y;
Datalines;
00-01 5 3
00-02 4 6
00-03 6 4
00-04 7 5
00-05 9 2
00-06 4 6
00-07 5 8
00-08 1 8
00-09 7 3
00-10  7 4
;
run;
proc iml;
z=7;
k=4;

use have nobs nobs;
read all var{id x y};
close;
r=rancomb(nobs,k,z);



 new_id=j(k,1,blankstr(nleng(id)));
create want var{reps new_id new_x new_y};
do i=1 to nrow(r);
 idx=r[i,];
 reps=j(k,1,i);
 new_id=id[idx];
 new_x=x[idx];
 new_y=y[idx];
 append;
end;
close;

quit;

 

SWEETSAS
Obsidian | Level 7

You guys are just incredible!!!

 

Many thanks. Let me digest this. 

Ksharp
Super User

This is vectorize operation . Could be very fast.

 

data have;
Input id $ x y;
Datalines;
00-01 5 3
00-02 4 6
00-03 6 4
00-04 7 5
00-05 9 2
00-06 4 6
00-07 5 8
00-08 1 8
00-09 7 3
00-10  7 4
;
run;
proc iml;
z=7;
k=4;

use have nobs nobs;
read all var{id x y};
close;
r=rancomb(nobs,k,z);


 idx=colvec(r);
 reps =colvec(row(r));
 new_id=id[idx];
 new_x=x[idx];
 new_y=y[idx];
create want var{reps new_id new_x new_y};
append;
close;

quit;
SWEETSAS
Obsidian | Level 7

All the answers are essentially correct. Thank you all very very much. 

Rick_SAS
SAS Super FREQ

This blog post explains to how to sample with or without replacement: "Four essential sampling methods in SAS".  It links to other articles that provide details. It shows how to do the sampling by using PROC SURVEYSELECT and by using the SAMPLE function in SAS/IML.

SWEETSAS
Obsidian | Level 7

Thanks Rick!

SWEETSAS
Obsidian | Level 7
This is correct answer as well

sas-innovate-2024.png

Don't miss out on SAS Innovate - Register now for the FREE Livestream!

Can't make it to Vegas? No problem! Watch our general sessions LIVE or on-demand starting April 17th. Hear from SAS execs, best-selling author Adam Grant, Hot Ones host Sean Evans, top tech journalist Kara Swisher, AI expert Cassie Kozyrkov, and the mind-blowing dance crew iLuminate! Plus, get access to over 20 breakout sessions.

 

Register now!

Multiple Linear Regression in SAS

Learn how to run multiple linear regression models with and without interactions, presented by SAS user Alex Chaplin.

Find more tutorials on the SAS Users YouTube channel.

From The DO Loop
Want more? Visit our blog for more articles like these.
Discussion stats
  • 12 replies
  • 1463 views
  • 2 likes
  • 5 in conversation