Solved: Random sampling

Chris_LK_87 · Posted 10-27-2022 09:48 AM

Hello,

I have a dataset with millions of individuals. Each individual can appear more than once. I would like to randomly select a samplesize of n=1000. The only criteria is that I want to select the same individual always and their cases. As shown in data want.

data have;
length id $10 Type $10;
input id$ Type$;
datalines;
1 A
1 B
1 D
2 A
2 F
3 L
4 E
4 T
5 H
6 J
;
run;

data want;
length id $10 Type $10;
input id$ Type$;
datalines;
1 A
1 B
1 D
3 L
4 E
4 T
6 J
;
run;

FreelanceReinh · Posted 10-27-2022 11:32 AM

Hello @Chris_LK_87,

You can use the CLUSTER statement of PROC SURVEYSELECT:

proc surveyselect data=have
method=srs n=4 /* use n=1000 for your real data */
seed=6180339 out=want;
cluster id;
run;

View solution in original post

PaigeMiller · Posted 10-27-2022 09:52 AM

If I am understanding you properly (and I'm not sure that I am, your description seems a little incomplete), you want to sample the distinct ID values with replacement to get 1000 ID values. See: https://blogs.sas.com/content/iml/2014/01/29/sample-with-replacement-in-sas.html

Then you can select all the observations from these 1000 ID values.

--
Paige Miller

FreelanceReinh · Posted 10-27-2022 11:32 AM

Hello @Chris_LK_87,

You can use the CLUSTER statement of PROC SURVEYSELECT:

proc surveyselect data=have
method=srs n=4 /* use n=1000 for your real data */
seed=6180339 out=want;
cluster id;
run;

Random sampling

Re: Random sampling

Re: Random sampling

Re: Random sampling

Random sampling

Re: Random sampling

Re: Random sampling

Re: Random sampling

SAS Innovate 2025: Call for Content

Click image to register for webinar

Classroom Training Available!