BookmarkSubscribeRSS Feed
deleted_user
Not applicable
Hi
I got 10 data sets which is having each 60000 obs. so i nees to get 100 random obs from each set . can you please let me know any one .

I got idea to use RANUNI but dont know how

thx
5 REPLIES 5
deleted_user
Not applicable
Try PROC SURVEYSELECT (with method=SRS) in order to select a simple random sample of size N.
deleted_user
Not applicable
Proc SurveySelect is not part of Base SAS, so you may not have it available to you.

Within SAS EG under Data is "Random Sample".

If coding here's an idea:

%macro select(inset,outset,size);

Data &outset;
set &inset nobs=N;
retain criteria count fudge 0;

if _n_ = 1 then criteria = N/&size;

if ranuni(-1) + fudge > criteria then do;
if count < 100 then do;
output;
count+1;
fudge+criteria;
end;
end;

drop criteria count fudge;
run;
quit;

%mend;

By increasing fudge, the probability of selecting a record increases, so that there is a greater change of selecting a particular record.
The downside to this method is that the actual probability distribution is not uniform. If fudge were not used, and "uniformity" maintained, then in a single pass through the dataset, you might not get all "size = 100" records/observations.

An alternative would be to use the POINT= set option

data &outset;
retain count 0;
I = ranuni(-1) * N;
set &inset NOBS=N POINT=I;
count+1;
if count = &size then stop;
drop count;
run;
quit;

This is probably a better method, and can also be encased in the above macro.
deleted_user
Not applicable
Another idea, that may work better:

%macro select(inset,outset,size);

data &outset;

retain count 0;
drop count;

I = ranuni(-1) * N;
set &inset NOBS=N POINT=I;

count+1;
if count = &size then stop;

run;
quit;

%mend;
advoss
Quartz | Level 8
Chuck, your approach doesn't guarantee that a row could be selected multiple times, does it?
deleted_user
Not applicable
Yes, you are correct for the POINT= method. There would need to be some way to check that that observation hadn't been used already.

Also, the calculation for I isn't quite right either since it doesn't guarantee an integer value.


I = round(ranuni(-1) * N) is an easy solution to the integer problem.

Solving the other problem takes a bit more work.
One way would be to use an array to keep a list of consumed records, and then use a linear search through the array to determine if the observation has been read before or not.

Another way to get a random subset of observations would require multiple passes through the dataset.

data dummy;
set &inset;
selection_key = ranuni(-1);
run;

proc sort data=dummy; by selection_key;

data &outset;
set dummy (obs=&size);
drop selection_key;
run;


But, this is still not perfectly generic, as none of the ideas are because they introduce at least one variable that may already be defined within the &inset dataset. So, no matter what is done, care must be taken, and some creativity on the part of the programmer.

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

SAS Enterprise Guide vs. SAS Studio

What’s the difference between SAS Enterprise Guide and SAS Studio? How are they similar? Just ask SAS’ Danny Modlin.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 5 replies
  • 751 views
  • 0 likes
  • 2 in conversation