Hi,
I have two data sources (one and two), Simple and straight forward/; id,year,month and day of a given visit.
What I would like, is to merge the files by id AND match the dates (month,day,year) as close as possible,
If it's a perfect match, that's easy. If it's within few days, I want to keep the closest date. SO if there are dates 2 and three days apart from another date I want to keep the one record that is 2 days apart.
At the end, I want everything from both sources.
I tried PROC SQL, full join,create a diff_Date, then using a having clause, I tried data step merges...
If you think you can help, I'd appreciate it,
Thank you!!!!!
data one;
input id day month year;
datalines;
1 2 12 2003
1 18 12 2003
1 14 4 2004
1 7 5 2004
1 26 5 2004
1 3 6 2004
1 18 6 2004
1 23 6 2004
1 9 7 2004
1 23 7 2004
1 18 8 2004
1 20 8 2004
1 25 8 2004
1 20 9 2004
1 20 10 2004
1 29 10 2004
1 3 11 2004
1 28 11 2004
1 28 11 2004
1 11 12 2004
1 11 12 2004
1 24 12 2004
1 19 1 2005
1 9 2 2005
1 23 2 2005
1 5 5 2005
1 17 5 2005
1 18 5 2005
1 20 6 2005
1 7 7 2005
1 13 7 2005
1 16 8 2005
1 1 9 2005
1 1 9 2005
1 12 9 2005
1 16 9 2005
1 12 10 2005
1 9 11 2005
1 30 11 2005
1 30 11 2005
;
data two;
input id day month year;
datalines;
1 14 4 2004
1 7 5 2004
1 21 5 2004
1 18 6 2004
1 9 7 2004
1 23 7 2004
1 17 8 2004
1 18 10 2004
1 28 11 2004
1 29 11 2004
1 30 11 2004
1 1 12 2004
1 2 12 2004
1 3 12 2004
1 4 12 2004
1 12 12 2004
1 13 12 2004
1 14 12 2004
1 15 12 2004
;
data want;
infile datalines missover;
input id day month year day_2 month_2 year_2;
datalines;
1 2 12 2003
1 18 12 2003
1 14 4 2004 14 4 2004
1 7 5 2004 7 5 2004
1 26 5 2004 21 5 2004
1 3 6 2004
1 18 6 2004 18 6 2004
1 23 6 2004
1 9 7 2004 9 7 2004
1 23 7 2004 23 7 2004
1 18 8 2004 17 8 2004
1 20 8 2004
1 25 8 2004
1 20 9 2004
1 20 10 2004 18 10 2004
1 29 10 2004
1 3 11 2004
1 28 11 2004 28 11 2004
1 28 11 2004 29 11 2004
1 30 11 2004
1 1 12 2004
1 2 12 2004
1 3 12 2004
1 4 12 2004
1 11 12 2004 12 12 2004
1 11 12 2004 13 12 2004
1 14 12 2004
1 15 12 2004
1 24 12 2004
1 19 1 2005
1 9 2 2005
1 23 2 2005
1 5 5 2005
1 17 5 2005
1 18 5 2005
1 20 6 2005
1 7 7 2005
1 13 7 2005
1 16 8 2005
1 1 9 2005
1 1 9 2005
1 12 9 2005
1 16 9 2005
1 12 10 2005
1 9 11 2005
1 30 11 2005
1 30 11 2005
;
What is the time gap beyond which there is no match?
It varies, from ID to ID, as this is a followup over ~10 years...and the visits could be weekly,monthly...anyly.
Hi there
Please try the followinfg codes and then with the output you can play with.
data one;
input id day month year;
datalines;
1 2 12 2003
1 18 12 2003
1 14 4 2004
1 7 5 2004
1 26 5 2004
1 3 6 2004
1 18 6 2004
1 23 6 2004
1 9 7 2004
1 23 7 2004
1 18 8 2004
1 20 8 2004
1 25 8 2004
1 20 9 2004
1 20 10 2004
1 29 10 2004
1 3 11 2004
1 28 11 2004
1 28 11 2004
1 11 12 2004
1 11 12 2004
1 24 12 2004
1 19 1 2005
1 9 2 2005
1 23 2 2005
1 5 5 2005
1 17 5 2005
1 18 5 2005
1 20 6 2005
1 7 7 2005
1 13 7 2005
1 16 8 2005
1 1 9 2005
1 1 9 2005
1 12 9 2005
1 16 9 2005
1 12 10 2005
1 9 11 2005
1 30 11 2005
1 30 11 2005
;
run;
data one;
set one;
date1=input(catx('/',day,month,year), ddmmyy10.);
format date1 ddmmyy10.;
run;
data two;
input id day2 month year;
datalines;
1 14 4 2004
1 7 5 2004
1 21 5 2004
1 18 6 2004
1 9 7 2004
1 23 7 2004
1 17 8 2004
1 18 10 2004
1 28 11 2004
1 29 11 2004
1 30 11 2004
1 1 12 2004
1 2 12 2004
1 3 12 2004
1 4 12 2004
1 12 12 2004
1 13 12 2004
1 14 12 2004
1 15 12 2004
;
run;
data two;
set two;
date2=input(catx('/',day2,month,year), ddmmyy10.);
format date2 ddmmyy10.;
run;
proc sort data=one;
by id year month;
run;
proc sort data=two;
by id year month;
run;
data all;
merge one(in=a) two(in=b);
by id year month;
if a or b;
run;
data want;
set all;
format match $20.;
if day2 ne . then do;
'diff in days'n=abs(day-day2);
end;
if 0 lt 'diff in days'n le 5 then match='Close Match';
else if 'diff in days'n=0 then match='Perfect Match';
else if 'diff in days'n gt 5 then match='No Match';
if day2 = . then match='No match';
run;
Output:
id | day | month | year | date1 | day2 | date2 | match | diff in days |
1 | 2 | 12 | 2003 | 02/12/2003 | No match | |||
1 | 18 | 12 | 2003 | 18/12/2003 | No match | |||
1 | 14 | 4 | 2004 | 14/04/2004 | 14 | 14/04/2004 | Perfect Match | 0 |
1 | 7 | 5 | 2004 | 07/05/2004 | 7 | 07/05/2004 | Perfect Match | 0 |
1 | 26 | 5 | 2004 | 26/05/2004 | 21 | 21/05/2004 | Close Match | 5 |
1 | 3 | 6 | 2004 | 03/06/2004 | 18 | 18/06/2004 | No Match | 15 |
1 | 18 | 6 | 2004 | 18/06/2004 | 18 | 18/06/2004 | Perfect Match | 0 |
1 | 23 | 6 | 2004 | 23/06/2004 | 18 | 18/06/2004 | Close Match | 5 |
1 | 9 | 7 | 2004 | 09/07/2004 | 9 | 09/07/2004 | Perfect Match | 0 |
1 | 23 | 7 | 2004 | 23/07/2004 | 23 | 23/07/2004 | Perfect Match | 0 |
1 | 18 | 8 | 2004 | 18/08/2004 | 17 | 17/08/2004 | Close Match | 1 |
1 | 20 | 8 | 2004 | 20/08/2004 | 17 | 17/08/2004 | Close Match | 3 |
1 | 25 | 8 | 2004 | 25/08/2004 | 17 | 17/08/2004 | No Match | 8 |
1 | 20 | 9 | 2004 | 20/09/2004 | No match | |||
1 | 20 | 10 | 2004 | 20/10/2004 | 18 | 18/10/2004 | Close Match | 2 |
1 | 29 | 10 | 2004 | 29/10/2004 | 18 | 18/10/2004 | No Match | 11 |
1 | 3 | 11 | 2004 | 03/11/2004 | 28 | 28/11/2004 | No Match | 25 |
1 | 28 | 11 | 2004 | 28/11/2004 | 29 | 29/11/2004 | Close Match | 1 |
1 | 28 | 11 | 2004 | 28/11/2004 | 30 | 30/11/2004 | Close Match | 2 |
1 | 11 | 12 | 2004 | 11/12/2004 | 1 | 01/12/2004 | No Match | 10 |
1 | 11 | 12 | 2004 | 11/12/2004 | 2 | 02/12/2004 | No Match | 9 |
1 | 24 | 12 | 2004 | 24/12/2004 | 3 | 03/12/2004 | No Match | 21 |
1 | 24 | 12 | 2004 | 24/12/2004 | 4 | 04/12/2004 | No Match | 20 |
1 | 24 | 12 | 2004 | 24/12/2004 | 12 | 12/12/2004 | No Match | 12 |
1 | 24 | 12 | 2004 | 24/12/2004 | 13 | 13/12/2004 | No Match | 11 |
1 | 24 | 12 | 2004 | 24/12/2004 | 14 | 14/12/2004 | No Match | 10 |
1 | 24 | 12 | 2004 | 24/12/2004 | 15 | 15/12/2004 | No Match | 9 |
1 | 19 | 1 | 2005 | 19/01/2005 | No match | |||
1 | 9 | 2 | 2005 | 09/02/2005 | No match | |||
1 | 23 | 2 | 2005 | 23/02/2005 | No match | |||
1 | 5 | 5 | 2005 | 05/05/2005 | No match | |||
1 | 17 | 5 | 2005 | 17/05/2005 | No match | |||
1 | 18 | 5 | 2005 | 18/05/2005 | No match | |||
1 | 20 | 6 | 2005 | 20/06/2005 | No match | |||
1 | 7 | 7 | 2005 | 07/07/2005 | No match | |||
1 | 13 | 7 | 2005 | 13/07/2005 | No match | |||
1 | 16 | 8 | 2005 | 16/08/2005 | No match | |||
1 | 1 | 9 | 2005 | 01/09/2005 | No match | |||
1 | 1 | 9 | 2005 | 01/09/2005 | No match | |||
1 | 12 | 9 | 2005 | 12/09/2005 | No match | |||
1 | 16 | 9 | 2005 | 16/09/2005 | No match | |||
1 | 12 | 10 | 2005 | 12/10/2005 | No match | |||
1 | 9 | 11 | 2005 | 09/11/2005 | No match | |||
1 | 30 | 11 | 2005 | 30/11/2005 | No match | |||
1 | 30 | 11 | 2005 | 30/11/2005 | No match |
Thank you, MIT.
I will certainly try your code.
Thank you.
Anca.
However, my senior supervisor told me there is no real need for me to combine these data sets, and may just use them individually.
So, I thank you.
Sincerely.
Anca.
data one;
input id day month year;
datalines;
1 2 12 2003
1 18 12 2003
1 14 4 2004
1 7 5 2004
1 26 5 2004
1 3 6 2004
1 18 6 2004
1 23 6 2004
1 9 7 2004
1 23 7 2004
1 18 8 2004
1 20 8 2004
1 25 8 2004
1 20 9 2004
1 20 10 2004
1 29 10 2004
1 3 11 2004
1 28 11 2004
1 28 11 2004
1 11 12 2004
1 11 12 2004
1 24 12 2004
1 19 1 2005
1 9 2 2005
1 23 2 2005
1 5 5 2005
1 17 5 2005
1 18 5 2005
1 20 6 2005
1 7 7 2005
1 13 7 2005
1 16 8 2005
1 1 9 2005
1 1 9 2005
1 12 9 2005
1 16 9 2005
1 12 10 2005
1 9 11 2005
1 30 11 2005
1 30 11 2005
;
data two;
input id day month year;
datalines;
1 14 4 2004
1 7 5 2004
1 21 5 2004
1 18 6 2004
1 9 7 2004
1 23 7 2004
1 17 8 2004
1 18 10 2004
1 28 11 2004
1 29 11 2004
1 30 11 2004
1 1 12 2004
1 2 12 2004
1 3 12 2004
1 4 12 2004
1 12 12 2004
1 13 12 2004
1 14 12 2004
1 15 12 2004
;
run;
data one_1;
set one;
newvar=input(catx('/',day,month,year),ddmmyy10.);
run;
data two_1;
set two;
newvar1=input(catx('/',day,month,year),ddmmyy10.);
run;
proc sort one_1;by id year month; run;
proc sort two_1;by id year month; run;
data merge1;
merge one_1(in=a) two_1(in=b);
by id year month;
if a or b;
run;
data new;
set merge1;
daysdifference=put(abs(intck('days',newvar,newvar1)),10.);
run;
data want;
set new;
format match $20.;
if 0 lt daysdifference le 5 then match='Close Match';
else if daysdifference=0 then match='Perfect Match';
else if daysdifference gt 5 then match='No Match';
if newvar1 = . then match='No match';
run;
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.