BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
elias_2020
Fluorite | Level 6

Hi there,

 

I would like to merge two data sets when the date of the first data set is equal or the closet date to the second date set date.

 

example

data diagonsed;
   input  id   diag_date mmddyy10.  disease $;
   format diag_date mmddyy10.;
   datalines;
1 05/23/10  C
1 05/26/10  C
1 01/01/11  B
;
run;



data disharge;
   input  id   dis_date mmddyy10.  disease $ hospital$;
   format dis_date mmddyy10.;
   datalines;
1 05/21/10  C    W
1 05/24/10  C    N
1 05/25/10  C    G
1 05/27/10  A    R
1 05/29/10  C    R
1 05/28/10  B    W
1 09/15/10  D    S
1 12/29/10  C    F
1 12/31/10  A    F
1 01/02/11  B    S
1 01/05/11  B    N
;
run;

/*i want to know the name of the hospital for those who were diagnosed for specific disease as in only diagnosed data set*/
/* The diagnosed date should be less or equal to the discharge date (minimum value=dis_date- diag_date) for the same type of disease in the first data set
*/
/* The correct answer should be as below: 
id   diag_date  dis_date  disease    hospital 
1    05/23/10   05/24/10     C             N
1    05/26/10   05/29/10     C             R
1    01/01/11   01/02/11     B              S

  for the same type of disease in both data sets, difference_Date=min( dis_date - diag_date ) and it should equal zero or grater than zero.

 

Could you please give me any suggestion.

 

Thanks

1 ACCEPTED SOLUTION

Accepted Solutions
Ksharp
Super User

How about this one ?

 

data diagnosed;
   input  id   diag_date mmddyy10.  disease $;
   format diag_date mmddyy10.;
   datalines;
1 05/23/10  C
1 05/26/10  C
1 01/01/11  B
;

data discharge;
   input  id   dis_date mmddyy10.  disease $ hospital$;
   format dis_date mmddyy10.;
   datalines;
1 05/21/10  C    W
1 05/24/10  C    N
1 05/25/10  C    G
1 05/27/10  A    R
1 05/29/10  C    R
1 05/28/10  B    W
1 09/15/10  D    S
1 12/29/10  C    F
1 12/31/10  A    F
1 01/02/11  B    S
1 01/05/11  B    N
;
proc sql;
select a.*,b.dis_date,b.hospital
 from diagnosed as a ,discharge as b
  where a.id=b.id and  b.dis_date-a.diag_date>0
   group by a.id,a.diag_date,a.disease
    having b.dis_date-a.diag_date=min(b.dis_date-a.diag_date);
quit;

View solution in original post

6 REPLIES 6
Reeza
Super User

Search on here. You’ll find plenty of examples. 

 

You can join on nearest using SQL easier than you can via a data step. 

 

If your data is not large, you can do a cross join and filter on anything with the date > date2. Then use a data step to pick the lowest/best match. 

elias_2020
Fluorite | Level 6

Thank you Reeza.

My data sets is too large and I would like to merge the two data set if the type of disease for both data sets are the same and the date should be the nearest date.

 

PGStats
Opal | Level 21

How about like this:

 

data diagnosed;
   input  id   diag_date mmddyy10.  disease $;
   format diag_date mmddyy10.;
   datalines;
1 05/23/10  C
1 05/26/10  C
1 01/01/11  B
;

data discharge;
   input  id   dis_date mmddyy10.  disease $ hospital$;
   format dis_date mmddyy10.;
   datalines;
1 05/21/10  C    W
1 05/24/10  C    N
1 05/25/10  C    G
1 05/27/10  A    R
1 05/29/10  C    R
1 05/28/10  B    W
1 09/15/10  D    S
1 12/29/10  C    F
1 12/31/10  A    F
1 01/02/11  B    S
1 01/05/11  B    N
;

proc sort data=diagnosed; by id disease diag_date; run;
proc sort data=discharge; by id disease dis_date; run;

data all;
retain diag_date;
merge 
    diagnosed(in=ina rename=diag_date=date) 
    discharge(in=inb rename=dis_date=date);
by id disease date;
if first.disease then call missing(diag_date);
if ina then diag_date = date;
if inb and not missing(diag_date) then do;
    dis_date=date;
    output;
    call missing(diag_date);
    end;
format diag_date dis_date mmddyy10.;
drop date;
run;

proc print noobs; run;
PG
mkeintz
PROC Star

I think this is a case where "set ... by ..." is simpler than "merge ... by ...",

 

data diagonsed;
  input  id   diag_date mmddyy10.  disease $;
  format diag_date mmddyy10.;
  datalines;
1 05/23/10  C
1 05/26/10  C
1 01/01/11  B
run;
data disharge;
  input  id   dis_date mmddyy10.  disease $ hospital$;
  format dis_date mmddyy10.;
  datalines;
1 05/21/10  C    W
1 05/24/10  C    N
1 05/25/10  C    G
1 05/27/10  A    R
1 05/29/10  C    R
1 05/28/10  B    W
1 09/15/10  D    S
1 12/29/10  C    F
1 12/31/10  A    F
1 01/02/11  B    S
1 01/05/11  B    N
run;

proc sort data=diagonsed; by id disease diag_date;run;
proc sort data=disharge; by id disease dis_date;run;


data want (rename=(date=dis_date));
  set diagonsed (in=indiag rename=(diag_date=date))
      disharge  (in=indis  rename=(dis_date=date));
  by id disease date;

  diag_date=lag(date); 
  format diag_date mmddyy10.;
  if indis=1 and lag(indiag)=1 and first.disease=0;
run;

 

The subsetting IF at the bottom does most of the work.  It only keeps cases in which a discharge record immediately follows a diagnosis record.  Adding the "and first.disease=0" condition ensures that the consecutive records are for the same id/disease.

 

While the "diag_date=lag(date)" sometimes will return a discharge date, all such unwanted cases are filtered out in the subsequent subsetting if.

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------
Ksharp
Super User

How about this one ?

 

data diagnosed;
   input  id   diag_date mmddyy10.  disease $;
   format diag_date mmddyy10.;
   datalines;
1 05/23/10  C
1 05/26/10  C
1 01/01/11  B
;

data discharge;
   input  id   dis_date mmddyy10.  disease $ hospital$;
   format dis_date mmddyy10.;
   datalines;
1 05/21/10  C    W
1 05/24/10  C    N
1 05/25/10  C    G
1 05/27/10  A    R
1 05/29/10  C    R
1 05/28/10  B    W
1 09/15/10  D    S
1 12/29/10  C    F
1 12/31/10  A    F
1 01/02/11  B    S
1 01/05/11  B    N
;
proc sql;
select a.*,b.dis_date,b.hospital
 from diagnosed as a ,discharge as b
  where a.id=b.id and  b.dis_date-a.diag_date>0
   group by a.id,a.diag_date,a.disease
    having b.dis_date-a.diag_date=min(b.dis_date-a.diag_date);
quit;
elias_2020
Fluorite | Level 6

Thanks Ksharp.

 

It works with me and I tried it for some cases and it gave me the correct answer.

 

sas-innovate-white.png

Special offer for SAS Communities members

Save $250 on SAS Innovate and get a free advance copy of the new SAS For Dummies book! Use the code "SASforDummies" to register. Don't miss out, May 6-9, in Orlando, Florida.

 

View the full agenda.

Register now!

How to connect to databases in SAS Viya

Need to connect to databases in SAS Viya? SAS’ David Ghan shows you two methods – via SAS/ACCESS LIBNAME and SAS Data Connector SASLIBS – in this video.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 6 replies
  • 1647 views
  • 1 like
  • 5 in conversation