BookmarkSubscribeRSS Feed
deleted_user
Not applicable
Hi,

I have a data set in which i want to subset a sample with a condition. In the subset I want to pull out 1.5 % of random observations of a variable and take the average of it.

I could do the first part of this task to pull the subset with a condition (using WHERE clause).

I'm trying to figure out how I can pull 1.5 % of random observations of the variable.

After that, I think I can use Avg function to get a mean of it.

Appreciate you help!
5 REPLIES 5
Olivier
Pyrite | Level 9
Hi.
Check out the Surveyselect procedure if you have licence for SAS/STAT.
If not, one usual way is to code :
1) a data step that creates a random variable (using the RANUNI or RAND functions)
2) sort the new dataset by this random variable
3) keep only the x first observations of the sorted dataset.
This last way of doing is just like shuffling cards.

Olivier
LinusH
Tourmaline | Level 20
You can also do a subsetting if in the first datastep (on the new "ranuni"-variable). Then there is no need to sort and read the data a third time.

Regards,
Linus
Data never sleeps
Patrick
Opal | Level 21
The code below (second data step) is almost what SAS provides in the training for the SAS certification.
HTH
Patrick

data work.rsubset;
set (where=( run;

data work.rsubset(drop=obsleft sampsize);
/* sampsize=10; */
sampsize=ceil(totobs*0.015);
obsleft=totobs;
do while(sampsize>0);
pickit+1;
if ranuni(0) set sasuser.revenue point=pickit
nobs=totobs;
output;
sampsize=sampsize-1;
end;
obsleft=obsleft-1;
end;
stop;
run;

Message was edited by: Patrick Message was edited by: Patrick
mkeintz
PROC Star

San:

Are you asking for 1.5% randomly selected from the entire data set?  Or do you want 1.5% of the subset that meets a given condition (you said "subset a random sample with a condition").  If it's the latter, then your question hasn't been answered yet.

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------
ArtC
Rhodochrosite | Level 12

We should also ask whether the subset is selected with or without replacement.  For small subsets of a large data set it probably does not matter a lot, but....

hackathon24-white-horiz.png

2025 SAS Hackathon: There is still time!

Good news: We've extended SAS Hackathon registration until Sept. 12, so you still have time to be part of our biggest event yet – our five-year anniversary!

Register Now

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 5 replies
  • 2061 views
  • 0 likes
  • 6 in conversation