DATA Step, Macro, Functions and more

Fuzzy Matching with Dates

Accepted Solution Solved
Reply
Frequent Contributor
Posts: 76
Accepted Solution

Fuzzy Matching with Dates

[ Edited ]

Hello,

 

I would like to fuzzy match dates that are within 5 days of each other, one data set has records that are much earlier. I was thinking I could subtract each date from have1 by each date from have2 to find the best match. 

 

have1

data have1;
    INFILE DATALINES DLM='#';
    input usubjid $ date mmddyy10. data1_flag;
	FORMAT date mmddyy10.;
    datalines;
Z26-C64 #2/16/2017 #1
Z26-C64 #3/25/2017 #1
Z26-C64 #6/02/2017 #1
;
run;

 


have2


data have2;
    INFILE DATALINES DLM='#';
    input usubjid $ date mmddyy10. data12flag;
	FORMAT date mmddyy10.;
    datalines;
Z26-C64 #2/02/2017 #1
Z26-C64 #2/15/2017 #1
Z26-C64 #3/25/2017 #1
;
run;


want

usubjid date data1_flag data2_flag Date_have2 Date_have1 Date_Diff
Z26-C64 2/2/2017 0 1 2/02/2017 . .
Z26-C64 2/15/2017 1 1 2/15/2017 2/16/2017 1
Z26-C64 3/25/2017 1 1 3/25/2017 3/25/2017 0
Z26-C64 6/2/2017 1 0 . 6/2/2017 .

 

 

EDIT: I believe I need to clarify that I am trying to do this for each unique_id. The case above is for a single unique_id (defined as usubjid).


Accepted Solutions
Solution
‎09-07-2017 12:57 PM
Super Contributor
Posts: 298

Re: Fuzzy Matching with Dates

Posted in reply to daszlosek

I have added another subject and worked with BY Group. Both the datasets are to be sorted for this code to work.

If this does not work for you, better place some data with 2 or more subjects. Show the required output too.

data have1;
    INFILE DATALINES DLM='#';
    input usubjid $ date mmddyy10. data1_flag;
   FORMAT date mmddyy10.;
    datalines;
Z26-C64 #2/16/2017 #1
Z26-C64 #3/25/2017 #1
Z26-C64 #6/02/2017 #1
Z28-C65 #6/06/2017 #2
Z28-C65 #7/25/2017 #2
;
run;
 


data have2;
    INFILE DATALINES DLM='#';
    input usubjid $ date mmddyy10. data12flag;
   FORMAT date mmddyy10.;
    datalines;
Z26-C64 #2/02/2017 #1
Z26-C64 #2/15/2017 #1
Z26-C64 #3/25/2017 #1
Z28-C65 #6/02/2017 #2
Z28-C65 #7/02/2017 #2
Z28-C65 #7/22/2017 #2
;
run;

proc sort data = have1;
by usubjid;
run;

proc sort data = have2;
by usubjid;
run;

data need;
   set have1 (rename = (usubjid = id)) end = eof;
      do p = 1 to num;
         set have2 (rename = (date=date2)) nobs = num point = p;
         if usubjid = id then do;
            d = abs(date - date2);
            put id = usubjid = date = date2 = d =;

         if d < 5 then output;
         end;
         else continue;
      end;
      put '===========================';
   if eof then stop;
run;

View solution in original post


All Replies
Trusted Advisor
Posts: 2,116

Re: Fuzzy Matching with Dates

Posted in reply to daszlosek

There are a bunch of SAS macros that have been published to do this.  I did a google search

matching macro sas

and got a bundh.  This one looks like it will do the trick

http://support.sas.com/resources/papers/proceedings10/061-2010.pdf

 

Doc Muhlbaier

Duke

Frequent Contributor
Posts: 76

Re: Fuzzy Matching with Dates

Hello Doc_Duke,

 

This macro is for matching cases to control and does this by individual usubjid's. I would need to match the same usubjid in each dataset and then fuzzy match the dates (each row) to the closest matching date within 5 days.

 

I will google search for matching macros, thank you for your help!

 

 

Super Contributor
Posts: 298

Re: Fuzzy Matching with Dates

Posted in reply to daszlosek

Each record of HAVE1 is compared with each record of HAVE2. Are you interested any difference of 5 days, then uncomment

the statement. Presently, the LOG gives the individual difference.

Hope this is acceptable.

data need;
   set have1 end = eof;
      do p = 1 to num;
         set have2 (rename = (date=date2)) nobs = num point = p;

         d = abs(date - date2);
         put date = date2 = d =;

/*          if d < 5 then output; */
      end;
   if eof then stop;
run;

The LOG:

date=02/16/2017 date2=02/02/2017 d=14
 date=02/16/2017 date2=02/15/2017 d=1
 date=02/16/2017 date2=03/25/2017 d=37
 date=03/25/2017 date2=02/02/2017 d=51
 date=03/25/2017 date2=02/15/2017 d=38
 date=03/25/2017 date2=03/25/2017 d=0
 date=06/02/2017 date2=02/02/2017 d=120
 date=06/02/2017 date2=02/15/2017 d=107
 date=06/02/2017 date2=03/25/2017 d=69
Frequent Contributor
Posts: 76

Re: Fuzzy Matching with Dates

Hello Datasp,

This is close to what I am looking for, I am trying to tweak it so it does by for matching unique ids between the two datasets (unique_Id = usubjid). If you have any hints that would be helpful!

Best
Solution
‎09-07-2017 12:57 PM
Super Contributor
Posts: 298

Re: Fuzzy Matching with Dates

Posted in reply to daszlosek

I have added another subject and worked with BY Group. Both the datasets are to be sorted for this code to work.

If this does not work for you, better place some data with 2 or more subjects. Show the required output too.

data have1;
    INFILE DATALINES DLM='#';
    input usubjid $ date mmddyy10. data1_flag;
   FORMAT date mmddyy10.;
    datalines;
Z26-C64 #2/16/2017 #1
Z26-C64 #3/25/2017 #1
Z26-C64 #6/02/2017 #1
Z28-C65 #6/06/2017 #2
Z28-C65 #7/25/2017 #2
;
run;
 


data have2;
    INFILE DATALINES DLM='#';
    input usubjid $ date mmddyy10. data12flag;
   FORMAT date mmddyy10.;
    datalines;
Z26-C64 #2/02/2017 #1
Z26-C64 #2/15/2017 #1
Z26-C64 #3/25/2017 #1
Z28-C65 #6/02/2017 #2
Z28-C65 #7/02/2017 #2
Z28-C65 #7/22/2017 #2
;
run;

proc sort data = have1;
by usubjid;
run;

proc sort data = have2;
by usubjid;
run;

data need;
   set have1 (rename = (usubjid = id)) end = eof;
      do p = 1 to num;
         set have2 (rename = (date=date2)) nobs = num point = p;
         if usubjid = id then do;
            d = abs(date - date2);
            put id = usubjid = date = date2 = d =;

         if d < 5 then output;
         end;
         else continue;
      end;
      put '===========================';
   if eof then stop;
run;
☑ This topic is solved.

Need further help from the community? Please ask a new question.

Discussion stats
  • 5 replies
  • 97 views
  • 0 likes
  • 3 in conversation