Hi,
I am working on a large data set for my thesis and need help with stratified random sampling within groups. Data set has client variable grouped as Female case, Male_cases, Female_control and Male_control. I want to select all the male and female cases but for the control group I want to match 4 controls on age and race, for each case. i.e. I want to match 4 Female_controls for each Female_case and 4 Male_controls for each Male_case.
ID Client Race Age Hospitals ID Services
1 Female_cases Black 45 000152 PS
2 Male_cases White 34 000121 HS
3 Female_control Asian 50 000542 HS
4 Male_control White 44 000199 HS
I want to add that I am using SAS university Edition.
How do you want to match ages? Do you want exact matches, matches within classes (21-25,26-30, ..), something else?
PROC psmatch?
Thank you for your message and sorry for the late reply!
I will use the code you suggested and see what happens.
Here is a simple approach for exact race and age matching:
data cases;
input id race $ age;
datalines;
1 A 21
3 B 31
4 B 31
;
data control;
input id race $ age;
datalines;
5 A 18
6 A 21
7 A 21
8 B 10
9 B 31
10 B 31
11 B 31
12 B 32
;
/* Create a copy of each case for each matched control */
data cases2;
set cases;
do i = 1 to 2;
output;
end;
drop i;
run;
/* Put the controls in random order */
data controlr;
set control;
rnd = rand('uniform');
run;
proc sort data=controlr; by id race age rnd; run;
/* Match cases and controls */
data sample;
merge
cases2 (in=inCases)
controlr (rename=id=controlId);
by race age;
if controlId = lag(controlId) then controlId = .;
if inCases;
drop rnd;
run;
proc print data=sample; run;
It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.