Hi,
I used the SAS code from this article (https://www.frontiersin.org/articles/10.3389/fdata.2019.00004/full) to do 1:2 matching for 1 case with 2 controls.
Does anyone know how to generate a column with case ID numbers that the 2 controls matched with in the "matches" file step?
/*Example Code*/
DATA population; /*Example of population dataset*/
input uniqueid gender $ age ethnic $ casecontrol; /*gender and ethnic are defined as categorical variables by the following $*/
cards;
1 f 26 1 1
2 m 27 1 1
3 m 30 2 0
4 f 25 2 0
5 m 29 1 0
6 f 26 1 1
7 f 28 2 0
8 f 28 1 0
9 m 25 1 1
10 m 26 2 0
11 f 30 3 0
12 m 29 2 0
13 f 27 1 0
14 m 27 3 0
15 f 26 1 1
16 f 25 2 0
17 f 29 1 0
18 f 30 3 0
19 m 26 1 0
20 m 26 2 1
21 f 30 2 0
22 m 25 3 0
23 m 27 1 0
24 m 28 1 1
;
run;
%LET agerange = 5; /*For this example, we have chosen the age range to be 5. Controls can therefore be up to 5 years younger or older than the case*/
%LET ratio = 3; /*We have chosen to match 3 controls for each case*/
DATA cases controls;
SET population;
IF casecontrol = 1 THEN OUTPUT cases;
ELSE OUTPUT controls;
RUN;
PROC FREQ NOPRINT DATA=cases;
TABLES age*gender*ethnic/OUT=caseout; /*Since we want ‘ethnic’ to be a matching variable, we need to add it here*/
RUN;
%MACRO sample(v_age, v_gender, v_count,v_ethnic); /*Here, ethnic is also added*/
DATA qualify1;
SET controls;
WHERE (&v_age-&agerange <= age <= &v_age+&agerange)
AND
(gender = "&v_gender")
AND
(ethnic = "&v_ethnic"); /*You will also need to add your extra variable to this step.*/
case_age = &v_age;
case_gender = "&v_gender";
case_ethnic = "&v_ethnic"; /* You will also need to add your extra variable to this step.*/
SEED = RANUNI(0);
PROC SORT;
BY SEED;
DATA qualify2;
SET qualify1 NOBS=totobs;
IF _N_ <= &v_count*∶
IF &v_count*&ratio <= totobs THEN tag = 'yes';
ELSE tag = 'no';
PROC APPEND BASE=matches DATA=qualify2 force; /*new data set matches will contain the matched controls*/
PROC SORT DATA=qualify2 OUT=temp1 (KEEP=uniqueid);
BY uniqueid;
PROC SORT DATA=controls OUT=temp2;
BY uniqueid;
DATA controls; /*the dataset controls is updated so that the controls already matched are removed and can not be matched again*/
MERGE temp1(IN=in1) temp2(IN=in2);
BY uniqueid;
IF in2 AND NOT in1;
%MEND sample;
DATA _NULL_;
SET caseout;
CALL EXECUTE ('%sample('||age||','||gender||','||count||','||ethnic||')'); /* You will also need to add your extra variable to this step.*/
RUN;
/*The next part is for testing if any of the cases have not received the wanted amount of controls*/
PROC FREQ NOPRINT DATA=matches;
TABLES case_age*case_gender*case_ethnic/OUT=con_out; /* You will also need to add your extra variable to this step.*/
PROC SORT DATA = caseout(RENAME=
(age=case_age gender=case_gender count=case_cnt ethnic=case_ethnic)); /* You will also need to add your extra variable to this step.*/
BY case_age case_gender case_ethnic; /* You will also need to add your extra variable to this step.*/
PROC SORT DATA = con_out (RENAME= (count=con_cnt));
BY case_age case_gender case_ethnic; /* You will also need to add your extra variable to this step.*/
DATA final (DROP=percent);
MERGE caseout con_out;
BY case_age case_gender case_ethnic; /* You will also need to add your extra variable to this step.*/
con_need = case_cnt*∶
IF con_cnt = . THEN con_cnt = 0;
diff = con_cnt-con_need;
PROC PRINT DATA = final; /*creates a table showing what characterizes the cases who have not received enough matches and how many matches they are missing*/
WHERE diff < 0;
TITLE 'Insufficient Matches';
RUN;
Thank you all in advance.
I can't figure out what you want to do here.
Please specify your desired result given the test data set.
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.