BookmarkSubscribeRSS Feed
Denali
Quartz | Level 8

Hi,

 

I used the SAS code from this article (https://www.frontiersin.org/articles/10.3389/fdata.2019.00004/full) to do 1:2 matching for 1 case with 2 controls. 

 

Does anyone know how to generate a column with case ID numbers that the 2 controls matched with in the "matches" file step?

 

/*Example Code*/

DATA population; /*Example of population dataset*/

input uniqueid gender $ age ethnic $ casecontrol; /*gender and ethnic are defined as categorical variables by the following $*/

cards;

              1 f 26 1 1

              2 m 27 1 1

              3 m 30 2 0

              4 f 25 2 0

              5 m 29 1 0

              6 f 26 1 1

              7 f 28 2 0

              8 f 28 1 0

              9 m 25 1 1

              10 m 26 2 0

              11 f 30 3 0

              12 m 29 2 0

              13 f 27 1 0

              14 m 27 3 0

              15 f 26 1 1

              16 f 25 2 0

              17 f 29 1 0

              18 f 30 3 0

              19 m 26 1 0

              20 m 26 2 1

              21 f 30 2 0

              22 m 25 3 0

              23 m 27 1 0

              24 m 28 1 1

;

run;

 

%LET agerange = 5; /*For this example, we have chosen the age range to be 5. Controls can therefore be up to 5 years younger or older than the case*/

%LET ratio = 3; /*We have chosen to match 3 controls for each case*/

 

DATA cases controls;

   SET population;

   IF casecontrol = 1 THEN OUTPUT cases;

   ELSE OUTPUT controls;

RUN;

 

PROC FREQ NOPRINT DATA=cases;

   TABLES age*gender*ethnic/OUT=caseout; /*Since we want ‘ethnic’ to be a matching variable, we need to add it here*/

RUN;

 

%MACRO sample(v_age, v_gender, v_count,v_ethnic); /*Here, ethnic is also added*/

 

   DATA qualify1;

      SET controls;

    WHERE (&v_age-&agerange <= age <= &v_age+&agerange)

      AND

      (gender = "&v_gender")

      AND

      (ethnic = "&v_ethnic"); /*You will also need to add your extra variable to this step.*/

 

   case_age = &v_age;

   case_gender = "&v_gender";

   case_ethnic = "&v_ethnic"; /* You will also need to add your extra variable to this step.*/

 

   SEED = RANUNI(0);

   PROC SORT;

      BY SEED;

 

   DATA qualify2;

      SET qualify1 NOBS=totobs;

      IF _N_ <= &v_count*&ratio;

      IF &v_count*&ratio <= totobs THEN tag = 'yes';

         ELSE tag = 'no';

 

   PROC APPEND BASE=matches DATA=qualify2 force; /*new data set matches will contain the matched controls*/

 

   PROC SORT DATA=qualify2 OUT=temp1 (KEEP=uniqueid);

      BY uniqueid;

 

   PROC SORT DATA=controls OUT=temp2;

      BY uniqueid;

 

   DATA controls; /*the dataset controls is updated so that the controls already matched are removed and can not be matched again*/

      MERGE temp1(IN=in1) temp2(IN=in2);

      BY uniqueid;

      IF in2 AND NOT in1;

 

%MEND sample;

 

DATA _NULL_;

   SET caseout;

   CALL EXECUTE ('%sample('||age||','||gender||','||count||','||ethnic||')'); /* You will also need to add your extra variable to this step.*/

RUN;

 

/*The next part is for testing if any of the cases have not received the wanted amount of controls*/

 

PROC FREQ NOPRINT DATA=matches;

   TABLES case_age*case_gender*case_ethnic/OUT=con_out; /* You will also need to add your extra variable to this step.*/

 

PROC SORT DATA = caseout(RENAME=

                (age=case_age gender=case_gender count=case_cnt ethnic=case_ethnic)); /* You will also need to add your extra variable to this step.*/

   BY case_age case_gender case_ethnic; /* You will also need to add your extra variable to this step.*/

 

PROC SORT DATA = con_out (RENAME= (count=con_cnt));

   BY case_age case_gender case_ethnic; /* You will also need to add your extra variable to this step.*/

 

DATA final (DROP=percent);

   MERGE caseout con_out;

   BY case_age case_gender case_ethnic; /* You will also need to add your extra variable to this step.*/

 

con_need = case_cnt*&ratio;

IF con_cnt = . THEN con_cnt = 0;

diff = con_cnt-con_need;

 

PROC PRINT DATA = final; /*creates a table showing what characterizes the cases who have not received enough matches and how many matches they are missing*/

   WHERE diff < 0;

   TITLE 'Insufficient Matches';

RUN;

 

 

Thank you all in advance. 

1 REPLY 1
PeterClemmensen
Tourmaline | Level 20

I can't figure out what you want to do here. 

 

Please specify your desired result given the test data set.

SAS Innovate 2025: Save the Date

 SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!

Save the date!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 1 reply
  • 363 views
  • 0 likes
  • 2 in conversation