Hi there,
I would like to merge two data sets when the date of the first data set is equal or the closet date to the second date set date.
example
data diagonsed;
input id diag_date mmddyy10. disease $;
format diag_date mmddyy10.;
datalines;
1 05/23/10 C
1 05/26/10 C
1 01/01/11 B
;
run;
data disharge;
input id dis_date mmddyy10. disease $ hospital$;
format dis_date mmddyy10.;
datalines;
1 05/21/10 C W
1 05/24/10 C N
1 05/25/10 C G
1 05/27/10 A R
1 05/29/10 C R
1 05/28/10 B W
1 09/15/10 D S
1 12/29/10 C F
1 12/31/10 A F
1 01/02/11 B S
1 01/05/11 B N
;
run;
/*i want to know the name of the hospital for those who were diagnosed for specific disease as in only diagnosed data set*/
/* The diagnosed date should be less or equal to the discharge date (minimum value=dis_date- diag_date) for the same type of disease in the first data set
*/
/* The correct answer should be as below:
id diag_date dis_date disease hospital
1 05/23/10 05/24/10 C N
1 05/26/10 05/29/10 C R
1 01/01/11 01/02/11 B S
for the same type of disease in both data sets, difference_Date=min( dis_date - diag_date ) and it should equal zero or grater than zero.
Could you please give me any suggestion.
Thanks
How about this one ?
data diagnosed;
input id diag_date mmddyy10. disease $;
format diag_date mmddyy10.;
datalines;
1 05/23/10 C
1 05/26/10 C
1 01/01/11 B
;
data discharge;
input id dis_date mmddyy10. disease $ hospital$;
format dis_date mmddyy10.;
datalines;
1 05/21/10 C W
1 05/24/10 C N
1 05/25/10 C G
1 05/27/10 A R
1 05/29/10 C R
1 05/28/10 B W
1 09/15/10 D S
1 12/29/10 C F
1 12/31/10 A F
1 01/02/11 B S
1 01/05/11 B N
;
proc sql;
select a.*,b.dis_date,b.hospital
from diagnosed as a ,discharge as b
where a.id=b.id and b.dis_date-a.diag_date>0
group by a.id,a.diag_date,a.disease
having b.dis_date-a.diag_date=min(b.dis_date-a.diag_date);
quit;
Search on here. You’ll find plenty of examples.
You can join on nearest using SQL easier than you can via a data step.
If your data is not large, you can do a cross join and filter on anything with the date > date2. Then use a data step to pick the lowest/best match.
Thank you Reeza.
My data sets is too large and I would like to merge the two data set if the type of disease for both data sets are the same and the date should be the nearest date.
How about like this:
data diagnosed;
input id diag_date mmddyy10. disease $;
format diag_date mmddyy10.;
datalines;
1 05/23/10 C
1 05/26/10 C
1 01/01/11 B
;
data discharge;
input id dis_date mmddyy10. disease $ hospital$;
format dis_date mmddyy10.;
datalines;
1 05/21/10 C W
1 05/24/10 C N
1 05/25/10 C G
1 05/27/10 A R
1 05/29/10 C R
1 05/28/10 B W
1 09/15/10 D S
1 12/29/10 C F
1 12/31/10 A F
1 01/02/11 B S
1 01/05/11 B N
;
proc sort data=diagnosed; by id disease diag_date; run;
proc sort data=discharge; by id disease dis_date; run;
data all;
retain diag_date;
merge
diagnosed(in=ina rename=diag_date=date)
discharge(in=inb rename=dis_date=date);
by id disease date;
if first.disease then call missing(diag_date);
if ina then diag_date = date;
if inb and not missing(diag_date) then do;
dis_date=date;
output;
call missing(diag_date);
end;
format diag_date dis_date mmddyy10.;
drop date;
run;
proc print noobs; run;
I think this is a case where "set ... by ..." is simpler than "merge ... by ...",
data diagonsed;
input id diag_date mmddyy10. disease $;
format diag_date mmddyy10.;
datalines;
1 05/23/10 C
1 05/26/10 C
1 01/01/11 B
run;
data disharge;
input id dis_date mmddyy10. disease $ hospital$;
format dis_date mmddyy10.;
datalines;
1 05/21/10 C W
1 05/24/10 C N
1 05/25/10 C G
1 05/27/10 A R
1 05/29/10 C R
1 05/28/10 B W
1 09/15/10 D S
1 12/29/10 C F
1 12/31/10 A F
1 01/02/11 B S
1 01/05/11 B N
run;
proc sort data=diagonsed; by id disease diag_date;run;
proc sort data=disharge; by id disease dis_date;run;
data want (rename=(date=dis_date));
set diagonsed (in=indiag rename=(diag_date=date))
disharge (in=indis rename=(dis_date=date));
by id disease date;
diag_date=lag(date);
format diag_date mmddyy10.;
if indis=1 and lag(indiag)=1 and first.disease=0;
run;
The subsetting IF at the bottom does most of the work. It only keeps cases in which a discharge record immediately follows a diagnosis record. Adding the "and first.disease=0" condition ensures that the consecutive records are for the same id/disease.
While the "diag_date=lag(date)" sometimes will return a discharge date, all such unwanted cases are filtered out in the subsequent subsetting if.
How about this one ?
data diagnosed;
input id diag_date mmddyy10. disease $;
format diag_date mmddyy10.;
datalines;
1 05/23/10 C
1 05/26/10 C
1 01/01/11 B
;
data discharge;
input id dis_date mmddyy10. disease $ hospital$;
format dis_date mmddyy10.;
datalines;
1 05/21/10 C W
1 05/24/10 C N
1 05/25/10 C G
1 05/27/10 A R
1 05/29/10 C R
1 05/28/10 B W
1 09/15/10 D S
1 12/29/10 C F
1 12/31/10 A F
1 01/02/11 B S
1 01/05/11 B N
;
proc sql;
select a.*,b.dis_date,b.hospital
from diagnosed as a ,discharge as b
where a.id=b.id and b.dis_date-a.diag_date>0
group by a.id,a.diag_date,a.disease
having b.dis_date-a.diag_date=min(b.dis_date-a.diag_date);
quit;
Thanks Ksharp.
It works with me and I tried it for some cases and it gave me the correct answer.
Save $250 on SAS Innovate and get a free advance copy of the new SAS For Dummies book! Use the code "SASforDummies" to register. Don't miss out, May 6-9, in Orlando, Florida.
Need to connect to databases in SAS Viya? SAS’ David Ghan shows you two methods – via SAS/ACCESS LIBNAME and SAS Data Connector SASLIBS – in this video.
Find more tutorials on the SAS Users YouTube channel.