Need to Pick Random samples of single coulmn from dataset

Reply
Contributor
Posts: 21

Need to Pick Random samples of single coulmn from dataset

Hi ,

I need to pick 10% random sample of a column in a dataset

example:- I have 20,000 rows in a dataset and it has 200 distinct members, each member is having 100 rows so now I have to pick 10% of distinct members and my output should have all the 100 rows for that picked members.

Thanks in Advance

Respected Advisor
Posts: 4,827

Re: Need to Pick Random samples of single coulmn from dataset

What you describe is cluster sampling. Use proc surveyselect:

proc surveyselect data=have samprate=0.1 out=want;

cluster member;

run;

PG

PG
Contributor
Posts: 21

Re: Need to Pick Random samples of single coulmn from dataset

Hi PG,

Thanks for the reply!!!

One more question Instead of extracting those random sampled members can we flag them ie, if the member is selected for random sample then mark it as "Y" else "N".

Thanks!!!

Respected Advisor
Posts: 4,827

Re: Need to Pick Random samples of single coulmn from dataset

Yes, use the OUTALL option, the SELECTED variable will = 1 if selected and 0 otherwise :

proc surveyselect data=have samprate=0.1 out=want outall;

cluster member;

run;

PG

PG
Contributor
Posts: 21

Re: Need to Pick Random samples of single coulmn from dataset

Hi PG,

Thanks for the reply and am having one more question!!! trust me this is the last one:smileylaugh:

Am having dataset where members went to different movies i.e there are 100 members seen movie Frozen in 2013 and 100 in 2014, 100 members seen movie Avengers in 2013 and 100 in 2014 , 100 members seen movie Titanicin 2013 and 100 in 2014. Members can repeat across movies and across 2013, 2014 that means a member can go to all three movies in 2013 and in 2014 or he can go to only one movie in 2013 and two in 2014 and a member can have multiple rows for each movie.

So now I need to flag 10% of distinct members for each movie, for each year  but if member is selected in movie frozen 2013  he should not be selected again.

Thanks!!!

Respected Advisor
Posts: 4,827

Re: Need to Pick Random samples of single coulmn from dataset

I am afraid this is not as easy. One way would be:

/* Compute the number of members to sample for each movie and year */

proc sql;

create table memberPick as

select movie, year, round(0.1*count(distinct member)) as _nsize_

from have

group by movie, year;

quit;

/* Select a random movie and year for each member */

data have0;

set have;

order = rand("uniform");

run;

proc sort data=have0; by member order; run;

data have1;

set have0; by member;

if first.member;

drop order;

run;

/* Sample members for each movie and year */

proc sort data=have1; by movie year; run;

proc surveyselect data=have1 sampsize=memberPick out=want outall;

strata movie year;

run;

PG

PG
Ask a Question
Discussion stats
  • 5 replies
  • 300 views
  • 3 likes
  • 2 in conversation