BookmarkSubscribeRSS Feed
rawindar
Calcite | Level 5

i have a datset with 1000 observations .my requirement is i want to  50 random observations  from the dataset.It must be in random order.How can i get .Can anybody explain.

7 REPLIES 7
Haikuo
Onyx | Level 15

More details would be needed for your question. There are so many types of random samples, just simple random sample, such as using ranuni(), or base on certain distributions? with or without replacement?

For an official approach, please check out help documents on Proc Surveyselect, which is designed to do the sampling.

Regards,

Haikuo

art297
Opal | Level 21

In addition to what Haikuo suggested, you can always combine methods.  E.g., once you select a sample, you can always use ranuni(i) to assign a pseudo random number to each record selected, and then sort the file by that number.

Augusto
Obsidian | Level 7

This is an example of proc surveyselect.

proc surveyselect

data = have  out = want

method = srs

  n = 50

noprint;

run;

proc print;

run;

I hope I have helped you.

Hima
Obsidian | Level 7

data want;

set have;

if RANUNI(111) > 0.2; ** adjust this bolded number so it picks apprx 50 obs depending on your source dataset.

run;

A commonly used function is RANUNI that returns a random variate from a uniform distribution.

http://www.sascommunity.org/wiki/How_the_SAS_Random_Number_Generators_Work

Adding an example:

data test;
informat rundate mmddyy10.;
format rundate mmddyy10.;
input rundate  product $ premthbal curmthbal;
cards;
01/05/2012 aaa . 1
01/05/2012 abc . 2
01/05/2012 bbb . 3
01/05/2012 ccc . 4
01/05/2012 ddd . 5
01/05/2012 eee . 6
01/05/2012 fff . 7
01/05/2012 ggg . 8
02/05/2012 aaa . 9
02/05/2012 abc . 8
02/05/2012 bbb . 7
02/05/2012 ccc . 6
02/05/2012 ddd . 5
02/05/2012 eee . 4
02/05/2012 fff . 3
02/05/2012 ggg . 2
03/05/2012 aaa . 1
03/05/2012 abc . 2
03/05/2012 bbb . 3
03/05/2012 ccc . 4
03/05/2012 ddd . 5
03/05/2012 eee . 6
03/05/2012 fff . 7
03/05/2012 ggg . 8
04/05/2012 aaa . 3
04/05/2012 abc . 2
04/05/2012 bbb . 1
04/05/2012 ccc . 3
04/05/2012 ddd . 2
04/05/2012 eee . 1
04/05/2012 fff . 3
04/05/2012 ggg . 2
05/05/2012 aaa . 3
05/05/2012 abc . 2
05/05/2012 bbb . 1
05/05/2012 ccc . 3
05/05/2012 ddd . 2
05/05/2012 eee . 1
05/05/2012 fff . 3
05/05/2012 ggg . 2
06/05/2012 aaa . 3
06/05/2012 abc . 2
06/05/2012 bbb . 1
06/05/2012 ccc . 3
06/05/2012 ddd . 2
06/05/2012 eee . 1
06/05/2012 fff . 3
06/05/2012 ggg . 2
06/05/2012 aaa . 3
06/05/2012 abc . 2
06/05/2012 bbb . 1
06/05/2012 ccc . 3
07/05/2012 ddd . 2
07/05/2012 eee . 1
07/05/2012 fff . 3
07/05/2012 ggg . 2
;
run;

data want;
set test;
if RANUNI(111) > 0.8;
run;

Output:

02/05/2012 bbb . 7

02/05/2012 ggg . 2

03/05/2012 eee . 6

03/05/2012 fff . 7

03/05/2012 ggg . 8

04/05/2012 aaa . 3

04/05/2012 eee . 1

04/05/2012 fff . 3

05/05/2012 ccc . 3

06/05/2012 ddd . 2

06/05/2012 ccc . 3

07/05/2012 ddd . 2

07/05/2012 eee . 1

PGStats
Opal | Level 21

PROC SURVEYSELECT will do a good job, but will not return the observations in random order. As Art pointed out, you will need extra work to shuffle the selected observations. For simple random sampling, you can get the same result with the following procedure. Assuming your observations are in dataset have, the following will select exactly 50 observations without replacement, ordered randomly in dataset want :

/* Writes a warning about terminating early */

proc sql outobs=50;
create table want as
select *, rand("UNIFORM") as _randomNumber from have
order by calculated _randomNumber;

proc print; run;

PG
Hima
Obsidian | Level 7

Hi PGStats - I tested the code and for some reason it was pulling all the observations. Can you verify? This is what I came up with from your code and pulling the 50 observations correctly and randomly

data want;
set test;
randomNumber = RAND("UNIFORM");
run;

proc sort data=want;
by randomNumber;
run;

data want1;
set want (firstobs=1 obs = 50);
drop randomNumber;
run;


proc print data=want1; run;

PGStats
Opal | Level 21

I just changed my proposal. Sorry about the other version. OBS= only works for input datasets, it doesn't work with output datasets. -PG

PG

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 7 replies
  • 1079 views
  • 0 likes
  • 6 in conversation