12-08-2017 10:14 PM
I am working on a large data set for my thesis and need help with stratified random sampling within groups. Data set has client variable grouped as Female case, Male_cases, Female_control and Male_control. I want to select all the male and female cases but for the control group I want to match 4 controls on age and race, for each case. i.e. I want to match 4 Female_controls for each Female_case and 4 Male_controls for each Male_case.
ID Client Race Age Hospitals ID Services
1 Female_cases Black 45 000152 PS
2 Male_cases White 34 000121 HS
3 Female_control Asian 50 000542 HS
4 Male_control White 44 000199 HS
I want to add that I am using SAS university Edition.
12-09-2017 12:19 AM
Here is a simple approach for exact race and age matching:
data cases; input id race $ age; datalines; 1 A 21 3 B 31 4 B 31 ; data control; input id race $ age; datalines; 5 A 18 6 A 21 7 A 21 8 B 10 9 B 31 10 B 31 11 B 31 12 B 32 ; /* Create a copy of each case for each matched control */ data cases2; set cases; do i = 1 to 2; output; end; drop i; run; /* Put the controls in random order */ data controlr; set control; rnd = rand('uniform'); run; proc sort data=controlr; by id race age rnd; run; /* Match cases and controls */ data sample; merge cases2 (in=inCases) controlr (rename=id=controlId); by race age; if controlId = lag(controlId) then controlId = .; if inCases; drop rnd; run; proc print data=sample; run;