BookmarkSubscribeRSS Feed
☑ This topic is solved. Need further help from the community? Please sign in and ask a new question.
Liamb
Obsidian | Level 7

Hello,

I have to check matching criteria on the following variables: age (+/- 3 years) gender dg_date (+/- 1 month).

How can I do?

Thanks

See example of data at below

data match;
  input Id  case_control match_caseid age gender dg_date : ddmmyy10. ;
  format dg_date ddmmyy10.;
datalines;
A1 Case	      20 F 20/08/2023
A2 Control A1 23 F 21/07/2023
A3 Control A1 22 F 22/09/2023
B1 Case	      30 M 01/06/2023
B2 Control B1 35 M 24/08/2023
B3 Control B1 33 M 12/06/2023
C1 Case	      40 M 26/05/2023
C2 Control C1 38 M 30/04/2023
D1 Case	      47 M 28/07/2023
D2 Control D1 41 M 29/06/2023
D3 Control D1 50 F 30/07/2023
D4 Control D1 63 M 31/07/2023
;
run;
1 ACCEPTED SOLUTION

Accepted Solutions
mkeintz
PROC Star

If all CASEs precede the relevant CONTROL candidates, then

 

data match;
  input Id $  case_control $ match_caseid $ age gender $ dg_date : ddmmyy10. ;
  format dg_date ddmmyy10.;
datalines;
A1 Case    A1 20 F 20/08/2023
A2 Control A1 23 F 21/07/2023
A3 Control A1 22 F 22/09/2023
B1 Case    B1 30 M 1/06/2023
B2 Control B1 35 M 24/08/2023
B3 Control B1 33 M 12/06/2023
C1 Case    C1 40 M 26/05/2023
C2 Control C1 38 M 30/04/2023
D1 Case    D1 47 M 28/07/2023
D2 Control D1 41 M 29/06/2023
D3 Control D1 50 F 30/07/2023
D4 Control D1 63 M 31/07/2023
run;


data want (drop=_:);
  set match
      match (obs=0 rename=(age=_age gender=_gender dg_date=_dg_date));

  if _n_=1 then do;
    declare hash h();
      h.definekey('id');
      h.definedata('_age','_gender','_dg_date');
      h.definedone();
  end;
    
  if case_control='Case' then do;
    _age=age;
    _gender=gender;
    _dg_date=dg_date;
    h.add();
  end;
  else do;
    _rc=h.find(key:match_caseid);
    if _rc^=0 then Match='Match_Caseid Not Found';
    else if gender=_gender and abs(age-_age)<=3
        and dg_date >  intnx('month',_dg_date,-1,'sameday') 
        and dg_date <= intnx('month',_dg_date,+1,'sameday')
        then Match='Yes';
    else Match='No ';
  end;
run;

Edit: minor change made to the "else if .... then Match='Yes'" statement to incorporate revised interpretation of date ranges.

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------

View solution in original post

6 REPLIES 6
ballardw
Super User

Suggest that you run that data step and examine the results. It throws a lot of invalid data messages because of missing values.

In datalines place . where a value  is missing and it may help.

 

I have no idea what you expect for a result. I think you need to expand a bit on exactly what you are checking for.

age +/- 3 years from what?

What check is to do be done for gender?

Dg_date +/- month from what?

Liamb
Obsidian | Level 7

I want to check if age (+/- 3 years), gender and dg_date (+/- 1 month) are the same for case and matched control

 

Data step corrected

data match;
  input Id $  case_control $ match_caseid $ age gender $ dg_date : ddmmyy10. ;
  format dg_date ddmmyy10.;
datalines;
A1 Case    A1 20 F 20/08/2023
A2 Control A1 23 F 21/07/2023
A3 Control A1 22 F 22/09/2023
B1 Case    B1 30 M 1/06/2023
B2 Control B1 35 M 24/08/2023
B3 Control B1 33 M 12/06/2023
C1 Case    C1 40 M 26/05/2023
C2 Control C1 38 M 30/04/2023
D1 Case    D1 47 M 28/07/2023
D2 Control D1 41 M 29/06/2023
D3 Control D1 50 F 30/07/2023
D4 Control D1 63 M 31/07/2023
;
run;
mkeintz
PROC Star

And what does  (+/- 1 month) for dg_date mean.  Does it mean calendar months two month apart? 

 

I.e.

         are   01jun2023  and 30apr2023 two months apart?

 

        or would it have to be some other criterion?

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------
Liamb
Obsidian | Level 7
(+/- 1 month) is the difference acceptable between dg_date for case and
dg_date for control, for example if case dg_date=08Aug2023 and control
dg_date must be between 09Jul2023 and 09Sep2023.
The maching criteria are satisfied if case and control have same age,
gender and dg_date
mkeintz
PROC Star

If all CASEs precede the relevant CONTROL candidates, then

 

data match;
  input Id $  case_control $ match_caseid $ age gender $ dg_date : ddmmyy10. ;
  format dg_date ddmmyy10.;
datalines;
A1 Case    A1 20 F 20/08/2023
A2 Control A1 23 F 21/07/2023
A3 Control A1 22 F 22/09/2023
B1 Case    B1 30 M 1/06/2023
B2 Control B1 35 M 24/08/2023
B3 Control B1 33 M 12/06/2023
C1 Case    C1 40 M 26/05/2023
C2 Control C1 38 M 30/04/2023
D1 Case    D1 47 M 28/07/2023
D2 Control D1 41 M 29/06/2023
D3 Control D1 50 F 30/07/2023
D4 Control D1 63 M 31/07/2023
run;


data want (drop=_:);
  set match
      match (obs=0 rename=(age=_age gender=_gender dg_date=_dg_date));

  if _n_=1 then do;
    declare hash h();
      h.definekey('id');
      h.definedata('_age','_gender','_dg_date');
      h.definedone();
  end;
    
  if case_control='Case' then do;
    _age=age;
    _gender=gender;
    _dg_date=dg_date;
    h.add();
  end;
  else do;
    _rc=h.find(key:match_caseid);
    if _rc^=0 then Match='Match_Caseid Not Found';
    else if gender=_gender and abs(age-_age)<=3
        and dg_date >  intnx('month',_dg_date,-1,'sameday') 
        and dg_date <= intnx('month',_dg_date,+1,'sameday')
        then Match='Yes';
    else Match='No ';
  end;
run;

Edit: minor change made to the "else if .... then Match='Yes'" statement to incorporate revised interpretation of date ranges.

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------
Liamb
Obsidian | Level 7

Thanks, it work

SAS Innovate 2025: Register Now

Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 6 replies
  • 1126 views
  • 1 like
  • 3 in conversation