DATA Step, Macro, Functions and more

subsetting random observations

Reply
N/A
Posts: 0

subsetting random observations

Hi,

I have a data set in which i want to subset a sample with a condition. In the subset I want to pull out 1.5 % of random observations of a variable and take the average of it.

I could do the first part of this task to pull the subset with a condition (using WHERE clause).

I'm trying to figure out how I can pull 1.5 % of random observations of the variable.

After that, I think I can use Avg function to get a mean of it.

Appreciate you help!
Super Contributor
Posts: 260

Re: subsetting random observations

Hi.
Check out the Surveyselect procedure if you have licence for SAS/STAT.
If not, one usual way is to code :
1) a data step that creates a random variable (using the RANUNI or RAND functions)
2) sort the new dataset by this random variable
3) keep only the x first observations of the sorted dataset.
This last way of doing is just like shuffling cards.

Olivier
Super User
Posts: 5,260

Re: subsetting random observations

You can also do a subsetting if in the first datastep (on the new "ranuni"-variable). Then there is no need to sort and read the data a third time.

Regards,
Linus
Data never sleeps
Respected Advisor
Posts: 3,907

Re: subsetting random observations

The code below (second data step) is almost what SAS provides in the training for the SAS certification.
HTH
Patrick

data work.rsubset;
set (where=( run;

data work.rsubset(drop=obsleft sampsize);
/* sampsize=10; */
sampsize=ceil(totobs*0.015);
obsleft=totobs;
do while(sampsize>0);
pickit+1;
if ranuni(0) set sasuser.revenue point=pickit
nobs=totobs;
output;
sampsize=sampsize-1;
end;
obsleft=obsleft-1;
end;
stop;
run;

Message was edited by: Patrick Message was edited by: Patrick
Valued Guide
Posts: 797

Re: subsetting random observations

San:

Are you asking for 1.5% randomly selected from the entire data set?  Or do you want 1.5% of the subset that meets a given condition (you said "subset a random sample with a condition").  If it's the latter, then your question hasn't been answered yet.

Valued Guide
Posts: 632

Re: subsetting random observations

We should also ask whether the subset is selected with or without replacement.  For small subsets of a large data set it probably does not matter a lot, but....

Ask a Question
Discussion stats
  • 5 replies
  • 193 views
  • 0 likes
  • 6 in conversation