BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
elias_2020
Fluorite | Level 6

Hi there,

 

I would like to merge two data sets when the date of the first data set is equal or the closet date to the second date set date.

 

example

data diagonsed;
   input  id   diag_date mmddyy10.  disease $;
   format diag_date mmddyy10.;
   datalines;
1 05/23/10  C
1 05/26/10  C
1 01/01/11  B
;
run;



data disharge;
   input  id   dis_date mmddyy10.  disease $ hospital$;
   format dis_date mmddyy10.;
   datalines;
1 05/21/10  C    W
1 05/24/10  C    N
1 05/25/10  C    G
1 05/27/10  A    R
1 05/29/10  C    R
1 05/28/10  B    W
1 09/15/10  D    S
1 12/29/10  C    F
1 12/31/10  A    F
1 01/02/11  B    S
1 01/05/11  B    N
;
run;

/*i want to know the name of the hospital for those who were diagnosed for specific disease as in only diagnosed data set*/
/* The diagnosed date should be less or equal to the discharge date (minimum value=dis_date- diag_date) for the same type of disease in the first data set
*/
/* The correct answer should be as below: 
id   diag_date  dis_date  disease    hospital 
1    05/23/10   05/24/10     C             N
1    05/26/10   05/29/10     C             R
1    01/01/11   01/02/11     B              S

  for the same type of disease in both data sets, difference_Date=min( dis_date - diag_date ) and it should equal zero or grater than zero.

 

Could you please give me any suggestion.

 

Thanks

1 ACCEPTED SOLUTION

Accepted Solutions
Ksharp
Super User

How about this one ?

 

data diagnosed;
   input  id   diag_date mmddyy10.  disease $;
   format diag_date mmddyy10.;
   datalines;
1 05/23/10  C
1 05/26/10  C
1 01/01/11  B
;

data discharge;
   input  id   dis_date mmddyy10.  disease $ hospital$;
   format dis_date mmddyy10.;
   datalines;
1 05/21/10  C    W
1 05/24/10  C    N
1 05/25/10  C    G
1 05/27/10  A    R
1 05/29/10  C    R
1 05/28/10  B    W
1 09/15/10  D    S
1 12/29/10  C    F
1 12/31/10  A    F
1 01/02/11  B    S
1 01/05/11  B    N
;
proc sql;
select a.*,b.dis_date,b.hospital
 from diagnosed as a ,discharge as b
  where a.id=b.id and  b.dis_date-a.diag_date>0
   group by a.id,a.diag_date,a.disease
    having b.dis_date-a.diag_date=min(b.dis_date-a.diag_date);
quit;

View solution in original post

6 REPLIES 6
Reeza
Super User

Search on here. You’ll find plenty of examples. 

 

You can join on nearest using SQL easier than you can via a data step. 

 

If your data is not large, you can do a cross join and filter on anything with the date > date2. Then use a data step to pick the lowest/best match. 

elias_2020
Fluorite | Level 6

Thank you Reeza.

My data sets is too large and I would like to merge the two data set if the type of disease for both data sets are the same and the date should be the nearest date.

 

PGStats
Opal | Level 21

How about like this:

 

data diagnosed;
   input  id   diag_date mmddyy10.  disease $;
   format diag_date mmddyy10.;
   datalines;
1 05/23/10  C
1 05/26/10  C
1 01/01/11  B
;

data discharge;
   input  id   dis_date mmddyy10.  disease $ hospital$;
   format dis_date mmddyy10.;
   datalines;
1 05/21/10  C    W
1 05/24/10  C    N
1 05/25/10  C    G
1 05/27/10  A    R
1 05/29/10  C    R
1 05/28/10  B    W
1 09/15/10  D    S
1 12/29/10  C    F
1 12/31/10  A    F
1 01/02/11  B    S
1 01/05/11  B    N
;

proc sort data=diagnosed; by id disease diag_date; run;
proc sort data=discharge; by id disease dis_date; run;

data all;
retain diag_date;
merge 
    diagnosed(in=ina rename=diag_date=date) 
    discharge(in=inb rename=dis_date=date);
by id disease date;
if first.disease then call missing(diag_date);
if ina then diag_date = date;
if inb and not missing(diag_date) then do;
    dis_date=date;
    output;
    call missing(diag_date);
    end;
format diag_date dis_date mmddyy10.;
drop date;
run;

proc print noobs; run;
PG
mkeintz
PROC Star

I think this is a case where "set ... by ..." is simpler than "merge ... by ...",

 

data diagonsed;
  input  id   diag_date mmddyy10.  disease $;
  format diag_date mmddyy10.;
  datalines;
1 05/23/10  C
1 05/26/10  C
1 01/01/11  B
run;
data disharge;
  input  id   dis_date mmddyy10.  disease $ hospital$;
  format dis_date mmddyy10.;
  datalines;
1 05/21/10  C    W
1 05/24/10  C    N
1 05/25/10  C    G
1 05/27/10  A    R
1 05/29/10  C    R
1 05/28/10  B    W
1 09/15/10  D    S
1 12/29/10  C    F
1 12/31/10  A    F
1 01/02/11  B    S
1 01/05/11  B    N
run;

proc sort data=diagonsed; by id disease diag_date;run;
proc sort data=disharge; by id disease dis_date;run;


data want (rename=(date=dis_date));
  set diagonsed (in=indiag rename=(diag_date=date))
      disharge  (in=indis  rename=(dis_date=date));
  by id disease date;

  diag_date=lag(date); 
  format diag_date mmddyy10.;
  if indis=1 and lag(indiag)=1 and first.disease=0;
run;

 

The subsetting IF at the bottom does most of the work.  It only keeps cases in which a discharge record immediately follows a diagnosis record.  Adding the "and first.disease=0" condition ensures that the consecutive records are for the same id/disease.

 

While the "diag_date=lag(date)" sometimes will return a discharge date, all such unwanted cases are filtered out in the subsequent subsetting if.

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------
Ksharp
Super User

How about this one ?

 

data diagnosed;
   input  id   diag_date mmddyy10.  disease $;
   format diag_date mmddyy10.;
   datalines;
1 05/23/10  C
1 05/26/10  C
1 01/01/11  B
;

data discharge;
   input  id   dis_date mmddyy10.  disease $ hospital$;
   format dis_date mmddyy10.;
   datalines;
1 05/21/10  C    W
1 05/24/10  C    N
1 05/25/10  C    G
1 05/27/10  A    R
1 05/29/10  C    R
1 05/28/10  B    W
1 09/15/10  D    S
1 12/29/10  C    F
1 12/31/10  A    F
1 01/02/11  B    S
1 01/05/11  B    N
;
proc sql;
select a.*,b.dis_date,b.hospital
 from diagnosed as a ,discharge as b
  where a.id=b.id and  b.dis_date-a.diag_date>0
   group by a.id,a.diag_date,a.disease
    having b.dis_date-a.diag_date=min(b.dis_date-a.diag_date);
quit;
elias_2020
Fluorite | Level 6

Thanks Ksharp.

 

It works with me and I tried it for some cases and it gave me the correct answer.

 

sas-innovate-2024.png

Don't miss out on SAS Innovate - Register now for the FREE Livestream!

Can't make it to Vegas? No problem! Watch our general sessions LIVE or on-demand starting April 17th. Hear from SAS execs, best-selling author Adam Grant, Hot Ones host Sean Evans, top tech journalist Kara Swisher, AI expert Cassie Kozyrkov, and the mind-blowing dance crew iLuminate! Plus, get access to over 20 breakout sessions.

 

Register now!

How to connect to databases in SAS Viya

Need to connect to databases in SAS Viya? SAS’ David Ghan shows you two methods – via SAS/ACCESS LIBNAME and SAS Data Connector SASLIBS – in this video.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 6 replies
  • 969 views
  • 1 like
  • 5 in conversation