Hello,
I have a dataset with millions of individuals. Each individual can appear more than once. I would like to randomly select a samplesize of n=1000. The only criteria is that I want to select the same individual always and their cases. As shown in data want.
data have;
length id $10 Type $10;
input id$ Type$;
datalines;
1 A
1 B
1 D
2 A
2 F
3 L
4 E
4 T
5 H
6 J
;
run;
data want;
length id $10 Type $10;
input id$ Type$;
datalines;
1 A
1 B
1 D
3 L
4 E
4 T
6 J
;
run;
Hello @Chris_LK_87,
You can use the CLUSTER statement of PROC SURVEYSELECT:
proc surveyselect data=have
method=srs n=4 /* use n=1000 for your real data */
seed=6180339 out=want;
cluster id;
run;
If I am understanding you properly (and I'm not sure that I am, your description seems a little incomplete), you want to sample the distinct ID values with replacement to get 1000 ID values. See: https://blogs.sas.com/content/iml/2014/01/29/sample-with-replacement-in-sas.html
Then you can select all the observations from these 1000 ID values.
Hello @Chris_LK_87,
You can use the CLUSTER statement of PROC SURVEYSELECT:
proc surveyselect data=have
method=srs n=4 /* use n=1000 for your real data */
seed=6180339 out=want;
cluster id;
run;
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 16. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.