Hello,
I have a dataset with millions of individuals. Each individual can appear more than once. I would like to randomly select a samplesize of n=1000. The only criteria is that I want to select the same individual always and their cases. As shown in data want.
data have;
length id $10 Type $10;
input id$ Type$;
datalines;
1 A
1 B
1 D
2 A
2 F
3 L
4 E
4 T
5 H
6 J
;
run;
data want;
length id $10 Type $10;
input id$ Type$;
datalines;
1 A
1 B
1 D
3 L
4 E
4 T
6 J
;
run;
Hello @Chris_LK_87,
You can use the CLUSTER statement of PROC SURVEYSELECT:
proc surveyselect data=have
method=srs n=4 /* use n=1000 for your real data */
seed=6180339 out=want;
cluster id;
run;
If I am understanding you properly (and I'm not sure that I am, your description seems a little incomplete), you want to sample the distinct ID values with replacement to get 1000 ID values. See: https://blogs.sas.com/content/iml/2014/01/29/sample-with-replacement-in-sas.html
Then you can select all the observations from these 1000 ID values.
Hello @Chris_LK_87,
You can use the CLUSTER statement of PROC SURVEYSELECT:
proc surveyselect data=have
method=srs n=4 /* use n=1000 for your real data */
seed=6180339 out=want;
cluster id;
run;
Don’t miss the livestream kicking off May 7. It’s free. It’s easy. And it’s the best seat in the house.
Join us virtually with our complimentary SAS Innovate Digital Pass. Watch live or on-demand in multiple languages, with translations available to help you get the most out of every session.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.