Hi there,
I am working on a case control study, for each case, I have 20 controls, but I would like to further select a smaller sample with 5 controls per case. How can I select it randomly without replacement? I have googled it and seems like proc surveyselect could be a good solution for my case, however I don't know how to specify the parameters to get what I want, anyone got any ideas?
A sample dataset would look like this:
data have;
input n id;
datalines
1 12
1 13
1 14
1 15
1 16
1 17
1 18
2 35
2 40
2 56
2 57
2 58
2 59
2 60
;
run;
where n refers to case id, and id refers to control id, I would like to select 5 controls per case randomly.
Thanks!
proc surveyselect data=have out=selected sampsize=5 outall; strata n; run;
The rule when you say something like nn per value of a variable is that the variable is a STRATA for surveyselect. The input set has to be sorted by the strata variable. The sampsize option has how many records per strata are desired. If you have different sizes per strata that can be accomplished by listing the sizes in order of the strata variable values such as sampsize(5 6 4) would say take 5 from the first strata, 6 from the second and 4 from the last.
I used the OUTALL option to create a set with all of your starting records and an added variable named Selected which has a value of 1 for the selected records. Notice that SAS also adds a selection probability and a sampling weight. Feel free to drop them if aren't going to use the weights for anything later.
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.