Hello,
I have a dataset with millions of individuals. Each individual can appear more than once. I would like to randomly select a samplesize of n=1000. The only criteria is that I want to select the same individual always and their cases. As shown in data want.
data have;
length id $10 Type $10;
input id$ Type$;
datalines;
1 A
1 B
1 D
2 A
2 F
3 L
4 E
4 T
5 H
6 J
;
run;
data want;
length id $10 Type $10;
input id$ Type$;
datalines;
1 A
1 B
1 D
3 L
4 E
4 T
6 J
;
run;
Hello @Chris_LK_87,
You can use the CLUSTER statement of PROC SURVEYSELECT:
proc surveyselect data=have
method=srs n=4 /* use n=1000 for your real data */
seed=6180339 out=want;
cluster id;
run;
If I am understanding you properly (and I'm not sure that I am, your description seems a little incomplete), you want to sample the distinct ID values with replacement to get 1000 ID values. See: https://blogs.sas.com/content/iml/2014/01/29/sample-with-replacement-in-sas.html
Then you can select all the observations from these 1000 ID values.
Hello @Chris_LK_87,
You can use the CLUSTER statement of PROC SURVEYSELECT:
proc surveyselect data=have
method=srs n=4 /* use n=1000 for your real data */
seed=6180339 out=want;
cluster id;
run;
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.