Hi ,
I need to pick 10% random sample of a column in a dataset
example:- I have 20,000 rows in a dataset and it has 200 distinct members, each member is having 100 rows so now I have to pick 10% of distinct members and my output should have all the 100 rows for that picked members.
Thanks in Advance
What you describe is cluster sampling. Use proc surveyselect:
proc surveyselect data=have samprate=0.1 out=want;
cluster member;
run;
PG
Hi PG,
Thanks for the reply!!!
One more question Instead of extracting those random sampled members can we flag them ie, if the member is selected for random sample then mark it as "Y" else "N".
Thanks!!!
Yes, use the OUTALL option, the SELECTED variable will = 1 if selected and 0 otherwise :
proc surveyselect data=have samprate=0.1 out=want outall;
cluster member;
run;
PG
Hi PG,
Thanks for the reply and am having one more question!!! trust me this is the last one:smileylaugh:
Am having dataset where members went to different movies i.e there are 100 members seen movie Frozen in 2013 and 100 in 2014, 100 members seen movie Avengers in 2013 and 100 in 2014 , 100 members seen movie Titanicin 2013 and 100 in 2014. Members can repeat across movies and across 2013, 2014 that means a member can go to all three movies in 2013 and in 2014 or he can go to only one movie in 2013 and two in 2014 and a member can have multiple rows for each movie.
So now I need to flag 10% of distinct members for each movie, for each year but if member is selected in movie frozen 2013 he should not be selected again.
Thanks!!!
I am afraid this is not as easy. One way would be:
/* Compute the number of members to sample for each movie and year */
proc sql;
create table memberPick as
select movie, year, round(0.1*count(distinct member)) as _nsize_
from have
group by movie, year;
quit;
/* Select a random movie and year for each member */
data have0;
set have;
order = rand("uniform");
run;
proc sort data=have0; by member order; run;
data have1;
set have0; by member;
if first.member;
drop order;
run;
/* Sample members for each movie and year */
proc sort data=have1; by movie year; run;
proc surveyselect data=have1 sampsize=memberPick out=want outall;
strata movie year;
run;
PG
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.