12-03-2014 11:47 AM
I need to pick 10% random sample of a column in a dataset
example:- I have 20,000 rows in a dataset and it has 200 distinct members, each member is having 100 rows so now I have to pick 10% of distinct members and my output should have all the 100 rows for that picked members.
Thanks in Advance
12-03-2014 11:58 AM
What you describe is cluster sampling. Use proc surveyselect:
proc surveyselect data=have samprate=0.1 out=want;
12-03-2014 12:28 PM
Thanks for the reply!!!
One more question Instead of extracting those random sampled members can we flag them ie, if the member is selected for random sample then mark it as "Y" else "N".
12-03-2014 12:55 PM
Yes, use the OUTALL option, the SELECTED variable will = 1 if selected and 0 otherwise :
proc surveyselect data=have samprate=0.1 out=want outall;
12-03-2014 02:28 PM
Thanks for the reply and am having one more question!!! trust me this is the last one:smileylaugh:
Am having dataset where members went to different movies i.e there are 100 members seen movie Frozen in 2013 and 100 in 2014, 100 members seen movie Avengers in 2013 and 100 in 2014 , 100 members seen movie Titanicin 2013 and 100 in 2014. Members can repeat across movies and across 2013, 2014 that means a member can go to all three movies in 2013 and in 2014 or he can go to only one movie in 2013 and two in 2014 and a member can have multiple rows for each movie.
So now I need to flag 10% of distinct members for each movie, for each year but if member is selected in movie frozen 2013 he should not be selected again.
12-03-2014 05:00 PM
I am afraid this is not as easy. One way would be:
/* Compute the number of members to sample for each movie and year */
create table memberPick as
select movie, year, round(0.1*count(distinct member)) as _nsize_
group by movie, year;
/* Select a random movie and year for each member */
order = rand("uniform");
proc sort data=have0; by member order; run;
set have0; by member;
/* Sample members for each movie and year */
proc sort data=have1; by movie year; run;
proc surveyselect data=have1 sampsize=memberPick out=want outall;
strata movie year;