DATA Step, Macro, Functions and more

random observations

Reply
Contributor
Posts: 70

random observations

i have a datset with 1000 observations .my requirement is i want to  50 random observations  from the dataset.It must be in random order.How can i get .Can anybody explain.

Respected Advisor
Posts: 3,124

random observations

More details would be needed for your question. There are so many types of random samples, just simple random sample, such as using ranuni(), or base on certain distributions? with or without replacement?

For an official approach, please check out help documents on Proc Surveyselect, which is designed to do the sampling.

Regards,

Haikuo

PROC Star
Posts: 7,364

random observations

In addition to what Haikuo suggested, you can always combine methods.  E.g., once you select a sample, you can always use ranuni(i) to assign a pseudo random number to each record selected, and then sort the file by that number.

Frequent Contributor
Posts: 80

Re: random observations

This is an example of proc surveyselect.

proc surveyselect

data = have  out = want

method = srs

  n = 50

noprint;

run;

proc print;

run;

I hope I have helped you.

Regular Contributor
Posts: 233

Re: random observations

data want;

set have;

if RANUNI(111) > 0.2; ** adjust this bolded number so it picks apprx 50 obs depending on your source dataset.

run;

A commonly used function is RANUNI that returns a random variate from a uniform distribution.

http://www.sascommunity.org/wiki/How_the_SAS_Random_Number_Generators_Work

Adding an example:

data test;
informat rundate mmddyy10.;
format rundate mmddyy10.;
input rundate  product $ premthbal curmthbal;
cards;
01/05/2012 aaa . 1
01/05/2012 abc . 2
01/05/2012 bbb . 3
01/05/2012 ccc . 4
01/05/2012 ddd . 5
01/05/2012 eee . 6
01/05/2012 fff . 7
01/05/2012 ggg . 8
02/05/2012 aaa . 9
02/05/2012 abc . 8
02/05/2012 bbb . 7
02/05/2012 ccc . 6
02/05/2012 ddd . 5
02/05/2012 eee . 4
02/05/2012 fff . 3
02/05/2012 ggg . 2
03/05/2012 aaa . 1
03/05/2012 abc . 2
03/05/2012 bbb . 3
03/05/2012 ccc . 4
03/05/2012 ddd . 5
03/05/2012 eee . 6
03/05/2012 fff . 7
03/05/2012 ggg . 8
04/05/2012 aaa . 3
04/05/2012 abc . 2
04/05/2012 bbb . 1
04/05/2012 ccc . 3
04/05/2012 ddd . 2
04/05/2012 eee . 1
04/05/2012 fff . 3
04/05/2012 ggg . 2
05/05/2012 aaa . 3
05/05/2012 abc . 2
05/05/2012 bbb . 1
05/05/2012 ccc . 3
05/05/2012 ddd . 2
05/05/2012 eee . 1
05/05/2012 fff . 3
05/05/2012 ggg . 2
06/05/2012 aaa . 3
06/05/2012 abc . 2
06/05/2012 bbb . 1
06/05/2012 ccc . 3
06/05/2012 ddd . 2
06/05/2012 eee . 1
06/05/2012 fff . 3
06/05/2012 ggg . 2
06/05/2012 aaa . 3
06/05/2012 abc . 2
06/05/2012 bbb . 1
06/05/2012 ccc . 3
07/05/2012 ddd . 2
07/05/2012 eee . 1
07/05/2012 fff . 3
07/05/2012 ggg . 2
;
run;

data want;
set test;
if RANUNI(111) > 0.8;
run;

Output:

02/05/2012 bbb . 7

02/05/2012 ggg . 2

03/05/2012 eee . 6

03/05/2012 fff . 7

03/05/2012 ggg . 8

04/05/2012 aaa . 3

04/05/2012 eee . 1

04/05/2012 fff . 3

05/05/2012 ccc . 3

06/05/2012 ddd . 2

06/05/2012 ccc . 3

07/05/2012 ddd . 2

07/05/2012 eee . 1

Respected Advisor
Posts: 4,659

Re: random observations

PROC SURVEYSELECT will do a good job, but will not return the observations in random order. As Art pointed out, you will need extra work to shuffle the selected observations. For simple random sampling, you can get the same result with the following procedure. Assuming your observations are in dataset have, the following will select exactly 50 observations without replacement, ordered randomly in dataset want :

/* Writes a warning about terminating early */

proc sql outobs=50;
create table want as
select *, rand("UNIFORM") as _randomNumber from have
order by calculated _randomNumber;

proc print; run;

PG
Regular Contributor
Posts: 233

Re: random observations

Hi PGStats - I tested the code and for some reason it was pulling all the observations. Can you verify? This is what I came up with from your code and pulling the 50 observations correctly and randomly

data want;
set test;
randomNumber = RAND("UNIFORM");
run;

proc sort data=want;
by randomNumber;
run;

data want1;
set want (firstobs=1 obs = 50);
drop randomNumber;
run;


proc print data=want1; run;

Respected Advisor
Posts: 4,659

Re: random observations

I just changed my proposal. Sorry about the other version. OBS= only works for input datasets, it doesn't work with output datasets. -PG

PG
Ask a Question
Discussion stats
  • 7 replies
  • 268 views
  • 0 likes
  • 6 in conversation