BookmarkSubscribeRSS Feed
deleted_user
Not applicable
Hi,

I have a data set in which i want to subset a sample with a condition. In the subset I want to pull out 1.5 % of random observations of a variable and take the average of it.

I could do the first part of this task to pull the subset with a condition (using WHERE clause).

I'm trying to figure out how I can pull 1.5 % of random observations of the variable.

After that, I think I can use Avg function to get a mean of it.

Appreciate you help!
5 REPLIES 5
Olivier
Pyrite | Level 9
Hi.
Check out the Surveyselect procedure if you have licence for SAS/STAT.
If not, one usual way is to code :
1) a data step that creates a random variable (using the RANUNI or RAND functions)
2) sort the new dataset by this random variable
3) keep only the x first observations of the sorted dataset.
This last way of doing is just like shuffling cards.

Olivier
LinusH
Tourmaline | Level 20
You can also do a subsetting if in the first datastep (on the new "ranuni"-variable). Then there is no need to sort and read the data a third time.

Regards,
Linus
Data never sleeps
Patrick
Opal | Level 21
The code below (second data step) is almost what SAS provides in the training for the SAS certification.
HTH
Patrick

data work.rsubset;
set (where=( run;

data work.rsubset(drop=obsleft sampsize);
/* sampsize=10; */
sampsize=ceil(totobs*0.015);
obsleft=totobs;
do while(sampsize>0);
pickit+1;
if ranuni(0) set sasuser.revenue point=pickit
nobs=totobs;
output;
sampsize=sampsize-1;
end;
obsleft=obsleft-1;
end;
stop;
run;

Message was edited by: Patrick Message was edited by: Patrick
mkeintz
PROC Star

San:

Are you asking for 1.5% randomly selected from the entire data set?  Or do you want 1.5% of the subset that meets a given condition (you said "subset a random sample with a condition").  If it's the latter, then your question hasn't been answered yet.

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------
ArtC
Rhodochrosite | Level 12

We should also ask whether the subset is selected with or without replacement.  For small subsets of a large data set it probably does not matter a lot, but....

sas-innovate-2024.png

Don't miss out on SAS Innovate - Register now for the FREE Livestream!

Can't make it to Vegas? No problem! Watch our general sessions LIVE or on-demand starting April 17th. Hear from SAS execs, best-selling author Adam Grant, Hot Ones host Sean Evans, top tech journalist Kara Swisher, AI expert Cassie Kozyrkov, and the mind-blowing dance crew iLuminate! Plus, get access to over 20 breakout sessions.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 5 replies
  • 1049 views
  • 0 likes
  • 6 in conversation