BookmarkSubscribeRSS Feed
☑ This topic is solved. Need further help from the community? Please sign in and ask a new question.
Liamb
Obsidian | Level 7

Hello,

I have to check matching criteria on the following variables: age (+/- 3 years) gender dg_date (+/- 1 month).

How can I do?

Thanks

See example of data at below

data match;
  input Id  case_control match_caseid age gender dg_date : ddmmyy10. ;
  format dg_date ddmmyy10.;
datalines;
A1 Case	      20 F 20/08/2023
A2 Control A1 23 F 21/07/2023
A3 Control A1 22 F 22/09/2023
B1 Case	      30 M 01/06/2023
B2 Control B1 35 M 24/08/2023
B3 Control B1 33 M 12/06/2023
C1 Case	      40 M 26/05/2023
C2 Control C1 38 M 30/04/2023
D1 Case	      47 M 28/07/2023
D2 Control D1 41 M 29/06/2023
D3 Control D1 50 F 30/07/2023
D4 Control D1 63 M 31/07/2023
;
run;
1 ACCEPTED SOLUTION

Accepted Solutions
mkeintz
PROC Star

If all CASEs precede the relevant CONTROL candidates, then

 

data match;
  input Id $  case_control $ match_caseid $ age gender $ dg_date : ddmmyy10. ;
  format dg_date ddmmyy10.;
datalines;
A1 Case    A1 20 F 20/08/2023
A2 Control A1 23 F 21/07/2023
A3 Control A1 22 F 22/09/2023
B1 Case    B1 30 M 1/06/2023
B2 Control B1 35 M 24/08/2023
B3 Control B1 33 M 12/06/2023
C1 Case    C1 40 M 26/05/2023
C2 Control C1 38 M 30/04/2023
D1 Case    D1 47 M 28/07/2023
D2 Control D1 41 M 29/06/2023
D3 Control D1 50 F 30/07/2023
D4 Control D1 63 M 31/07/2023
run;


data want (drop=_:);
  set match
      match (obs=0 rename=(age=_age gender=_gender dg_date=_dg_date));

  if _n_=1 then do;
    declare hash h();
      h.definekey('id');
      h.definedata('_age','_gender','_dg_date');
      h.definedone();
  end;
    
  if case_control='Case' then do;
    _age=age;
    _gender=gender;
    _dg_date=dg_date;
    h.add();
  end;
  else do;
    _rc=h.find(key:match_caseid);
    if _rc^=0 then Match='Match_Caseid Not Found';
    else if gender=_gender and abs(age-_age)<=3
        and dg_date >  intnx('month',_dg_date,-1,'sameday') 
        and dg_date <= intnx('month',_dg_date,+1,'sameday')
        then Match='Yes';
    else Match='No ';
  end;
run;

Edit: minor change made to the "else if .... then Match='Yes'" statement to incorporate revised interpretation of date ranges.

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------

View solution in original post

6 REPLIES 6
ballardw
Super User

Suggest that you run that data step and examine the results. It throws a lot of invalid data messages because of missing values.

In datalines place . where a value  is missing and it may help.

 

I have no idea what you expect for a result. I think you need to expand a bit on exactly what you are checking for.

age +/- 3 years from what?

What check is to do be done for gender?

Dg_date +/- month from what?

Liamb
Obsidian | Level 7

I want to check if age (+/- 3 years), gender and dg_date (+/- 1 month) are the same for case and matched control

 

Data step corrected

data match;
  input Id $  case_control $ match_caseid $ age gender $ dg_date : ddmmyy10. ;
  format dg_date ddmmyy10.;
datalines;
A1 Case    A1 20 F 20/08/2023
A2 Control A1 23 F 21/07/2023
A3 Control A1 22 F 22/09/2023
B1 Case    B1 30 M 1/06/2023
B2 Control B1 35 M 24/08/2023
B3 Control B1 33 M 12/06/2023
C1 Case    C1 40 M 26/05/2023
C2 Control C1 38 M 30/04/2023
D1 Case    D1 47 M 28/07/2023
D2 Control D1 41 M 29/06/2023
D3 Control D1 50 F 30/07/2023
D4 Control D1 63 M 31/07/2023
;
run;
mkeintz
PROC Star

And what does  (+/- 1 month) for dg_date mean.  Does it mean calendar months two month apart? 

 

I.e.

         are   01jun2023  and 30apr2023 two months apart?

 

        or would it have to be some other criterion?

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------
Liamb
Obsidian | Level 7
(+/- 1 month) is the difference acceptable between dg_date for case and
dg_date for control, for example if case dg_date=08Aug2023 and control
dg_date must be between 09Jul2023 and 09Sep2023.
The maching criteria are satisfied if case and control have same age,
gender and dg_date
mkeintz
PROC Star

If all CASEs precede the relevant CONTROL candidates, then

 

data match;
  input Id $  case_control $ match_caseid $ age gender $ dg_date : ddmmyy10. ;
  format dg_date ddmmyy10.;
datalines;
A1 Case    A1 20 F 20/08/2023
A2 Control A1 23 F 21/07/2023
A3 Control A1 22 F 22/09/2023
B1 Case    B1 30 M 1/06/2023
B2 Control B1 35 M 24/08/2023
B3 Control B1 33 M 12/06/2023
C1 Case    C1 40 M 26/05/2023
C2 Control C1 38 M 30/04/2023
D1 Case    D1 47 M 28/07/2023
D2 Control D1 41 M 29/06/2023
D3 Control D1 50 F 30/07/2023
D4 Control D1 63 M 31/07/2023
run;


data want (drop=_:);
  set match
      match (obs=0 rename=(age=_age gender=_gender dg_date=_dg_date));

  if _n_=1 then do;
    declare hash h();
      h.definekey('id');
      h.definedata('_age','_gender','_dg_date');
      h.definedone();
  end;
    
  if case_control='Case' then do;
    _age=age;
    _gender=gender;
    _dg_date=dg_date;
    h.add();
  end;
  else do;
    _rc=h.find(key:match_caseid);
    if _rc^=0 then Match='Match_Caseid Not Found';
    else if gender=_gender and abs(age-_age)<=3
        and dg_date >  intnx('month',_dg_date,-1,'sameday') 
        and dg_date <= intnx('month',_dg_date,+1,'sameday')
        then Match='Yes';
    else Match='No ';
  end;
run;

Edit: minor change made to the "else if .... then Match='Yes'" statement to incorporate revised interpretation of date ranges.

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------
Liamb
Obsidian | Level 7

Thanks, it work

SAS Innovate 2025: Save the Date

 SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!

Save the date!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 6 replies
  • 1062 views
  • 1 like
  • 3 in conversation