i have a datset with 1000 observations .my requirement is i want to 50 random observations from the dataset.It must be in random order.How can i get .Can anybody explain.
More details would be needed for your question. There are so many types of random samples, just simple random sample, such as using ranuni(), or base on certain distributions? with or without replacement?
For an official approach, please check out help documents on Proc Surveyselect, which is designed to do the sampling.
Regards,
Haikuo
In addition to what Haikuo suggested, you can always combine methods. E.g., once you select a sample, you can always use ranuni(i) to assign a pseudo random number to each record selected, and then sort the file by that number.
This is an example of proc surveyselect.
proc surveyselect
data = have out = want
method = srs
n = 50
noprint;
run;
proc print;
run;
I hope I have helped you.
data want;
set have;
if RANUNI(111) > 0.2; ** adjust this bolded number so it picks apprx 50 obs depending on your source dataset.
run;
A commonly used function is RANUNI that returns a random variate from a uniform distribution.
http://www.sascommunity.org/wiki/How_the_SAS_Random_Number_Generators_Work
Adding an example:
data test;
informat rundate mmddyy10.;
format rundate mmddyy10.;
input rundate product $ premthbal curmthbal;
cards;
01/05/2012 aaa . 1
01/05/2012 abc . 2
01/05/2012 bbb . 3
01/05/2012 ccc . 4
01/05/2012 ddd . 5
01/05/2012 eee . 6
01/05/2012 fff . 7
01/05/2012 ggg . 8
02/05/2012 aaa . 9
02/05/2012 abc . 8
02/05/2012 bbb . 7
02/05/2012 ccc . 6
02/05/2012 ddd . 5
02/05/2012 eee . 4
02/05/2012 fff . 3
02/05/2012 ggg . 2
03/05/2012 aaa . 1
03/05/2012 abc . 2
03/05/2012 bbb . 3
03/05/2012 ccc . 4
03/05/2012 ddd . 5
03/05/2012 eee . 6
03/05/2012 fff . 7
03/05/2012 ggg . 8
04/05/2012 aaa . 3
04/05/2012 abc . 2
04/05/2012 bbb . 1
04/05/2012 ccc . 3
04/05/2012 ddd . 2
04/05/2012 eee . 1
04/05/2012 fff . 3
04/05/2012 ggg . 2
05/05/2012 aaa . 3
05/05/2012 abc . 2
05/05/2012 bbb . 1
05/05/2012 ccc . 3
05/05/2012 ddd . 2
05/05/2012 eee . 1
05/05/2012 fff . 3
05/05/2012 ggg . 2
06/05/2012 aaa . 3
06/05/2012 abc . 2
06/05/2012 bbb . 1
06/05/2012 ccc . 3
06/05/2012 ddd . 2
06/05/2012 eee . 1
06/05/2012 fff . 3
06/05/2012 ggg . 2
06/05/2012 aaa . 3
06/05/2012 abc . 2
06/05/2012 bbb . 1
06/05/2012 ccc . 3
07/05/2012 ddd . 2
07/05/2012 eee . 1
07/05/2012 fff . 3
07/05/2012 ggg . 2
;
run;
data want;
set test;
if RANUNI(111) > 0.8;
run;
Output:
02/05/2012 bbb . 7
02/05/2012 ggg . 2
03/05/2012 eee . 6
03/05/2012 fff . 7
03/05/2012 ggg . 8
04/05/2012 aaa . 3
04/05/2012 eee . 1
04/05/2012 fff . 3
05/05/2012 ccc . 3
06/05/2012 ddd . 2
06/05/2012 ccc . 3
07/05/2012 ddd . 2
07/05/2012 eee . 1
PROC SURVEYSELECT will do a good job, but will not return the observations in random order. As Art pointed out, you will need extra work to shuffle the selected observations. For simple random sampling, you can get the same result with the following procedure. Assuming your observations are in dataset have, the following will select exactly 50 observations without replacement, ordered randomly in dataset want :
/* Writes a warning about terminating early */
proc sql outobs=50;
create table want as
select *, rand("UNIFORM") as _randomNumber from have
order by calculated _randomNumber;
proc print; run;
Hi PGStats - I tested the code and for some reason it was pulling all the observations. Can you verify? This is what I came up with from your code and pulling the 50 observations correctly and randomly
data want;
set test;
randomNumber = RAND("UNIFORM");
run;
proc sort data=want;
by randomNumber;
run;
data want1;
set want (firstobs=1 obs = 50);
drop randomNumber;
run;
proc print data=want1; run;
I just changed my proposal. Sorry about the other version. OBS= only works for input datasets, it doesn't work with output datasets. -PG
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.