Hi,
I am working on a large data set for my thesis and need help with stratified random sampling within groups. Data set has client variable grouped as Female case, Male_cases, Female_control and Male_control. I want to select all the male and female cases but for the control group I want to match 4 controls on age and race, for each case. i.e. I want to match 4 Female_controls for each Female_case and 4 Male_controls for each Male_case.
ID Client Race Age Hospitals ID Services
1 Female_cases Black 45 000152 PS
2 Male_cases White 34 000121 HS
3 Female_control Asian 50 000542 HS
4 Male_control White 44 000199 HS
I want to add that I am using SAS university Edition.
How do you want to match ages? Do you want exact matches, matches within classes (21-25,26-30, ..), something else?
PROC psmatch?
Thank you for your message and sorry for the late reply!
I will use the code you suggested and see what happens.
Here is a simple approach for exact race and age matching:
data cases;
input id race $ age;
datalines;
1 A 21
3 B 31
4 B 31
;
data control;
input id race $ age;
datalines;
5 A 18
6 A 21
7 A 21
8 B 10
9 B 31
10 B 31
11 B 31
12 B 32
;
/* Create a copy of each case for each matched control */
data cases2;
set cases;
do i = 1 to 2;
output;
end;
drop i;
run;
/* Put the controls in random order */
data controlr;
set control;
rnd = rand('uniform');
run;
proc sort data=controlr; by id race age rnd; run;
/* Match cases and controls */
data sample;
merge
cases2 (in=inCases)
controlr (rename=id=controlId);
by race age;
if controlId = lag(controlId) then controlId = .;
if inCases;
drop rnd;
run;
proc print data=sample; run;
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.