BookmarkSubscribeRSS Feed
rkolupoti9001
Calcite | Level 5

Hi ,

I need to pick 10% random sample of a column in a dataset

example:- I have 20,000 rows in a dataset and it has 200 distinct members, each member is having 100 rows so now I have to pick 10% of distinct members and my output should have all the 100 rows for that picked members.

Thanks in Advance

5 REPLIES 5
PGStats
Opal | Level 21

What you describe is cluster sampling. Use proc surveyselect:

proc surveyselect data=have samprate=0.1 out=want;

cluster member;

run;

PG

PG
rkolupoti9001
Calcite | Level 5

Hi PG,

Thanks for the reply!!!

One more question Instead of extracting those random sampled members can we flag them ie, if the member is selected for random sample then mark it as "Y" else "N".

Thanks!!!

PGStats
Opal | Level 21

Yes, use the OUTALL option, the SELECTED variable will = 1 if selected and 0 otherwise :

proc surveyselect data=have samprate=0.1 out=want outall;

cluster member;

run;

PG

PG
rkolupoti9001
Calcite | Level 5

Hi PG,

Thanks for the reply and am having one more question!!! trust me this is the last one:smileylaugh:

Am having dataset where members went to different movies i.e there are 100 members seen movie Frozen in 2013 and 100 in 2014, 100 members seen movie Avengers in 2013 and 100 in 2014 , 100 members seen movie Titanicin 2013 and 100 in 2014. Members can repeat across movies and across 2013, 2014 that means a member can go to all three movies in 2013 and in 2014 or he can go to only one movie in 2013 and two in 2014 and a member can have multiple rows for each movie.

So now I need to flag 10% of distinct members for each movie, for each year  but if member is selected in movie frozen 2013  he should not be selected again.

Thanks!!!

PGStats
Opal | Level 21

I am afraid this is not as easy. One way would be:

/* Compute the number of members to sample for each movie and year */

proc sql;

create table memberPick as

select movie, year, round(0.1*count(distinct member)) as _nsize_

from have

group by movie, year;

quit;

/* Select a random movie and year for each member */

data have0;

set have;

order = rand("uniform");

run;

proc sort data=have0; by member order; run;

data have1;

set have0; by member;

if first.member;

drop order;

run;

/* Sample members for each movie and year */

proc sort data=have1; by movie year; run;

proc surveyselect data=have1 sampsize=memberPick out=want outall;

strata movie year;

run;

PG

PG

sas-innovate-2024.png

Don't miss out on SAS Innovate - Register now for the FREE Livestream!

Can't make it to Vegas? No problem! Watch our general sessions LIVE or on-demand starting April 17th. Hear from SAS execs, best-selling author Adam Grant, Hot Ones host Sean Evans, top tech journalist Kara Swisher, AI expert Cassie Kozyrkov, and the mind-blowing dance crew iLuminate! Plus, get access to over 20 breakout sessions.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 5 replies
  • 700 views
  • 3 likes
  • 2 in conversation