SAS Data Integration Studio, DataFlux Data Management Studio, SAS/ACCESS, SAS Data Loader for Hadoop and others

merge two data sets with multiple records

Accepted Solution Solved
Reply
Occasional Contributor
Posts: 15
Accepted Solution

merge two data sets with multiple records

[ Edited ]

Hi there,

 

I would like to merge two data sets when the date of the first data set is equal or the closet date to the second date set date.

 

example

data diagonsed;
   input  id   diag_date mmddyy10.  disease $;
   format diag_date mmddyy10.;
   datalines;
1 05/23/10  C
1 05/26/10  C
1 01/01/11  B
;
run;



data disharge;
   input  id   dis_date mmddyy10.  disease $ hospital$;
   format dis_date mmddyy10.;
   datalines;
1 05/21/10  C    W
1 05/24/10  C    N
1 05/25/10  C    G
1 05/27/10  A    R
1 05/29/10  C    R
1 05/28/10  B    W
1 09/15/10  D    S
1 12/29/10  C    F
1 12/31/10  A    F
1 01/02/11  B    S
1 01/05/11  B    N
;
run;

/*i want to know the name of the hospital for those who were diagnosed for specific disease as in only diagnosed data set*/
/* The diagnosed date should be less or equal to the discharge date (minimum value=dis_date- diag_date) for the same type of disease in the first data set
*/
/* The correct answer should be as below: 
id   diag_date  dis_date  disease    hospital 
1    05/23/10   05/24/10     C             N
1    05/26/10   05/29/10     C             R
1    01/01/11   01/02/11     B              S

  for the same type of disease in both data sets, difference_Date=min( dis_date - diag_date ) and it should equal zero or grater than zero.

 

Could you please give me any suggestion.

 

Thanks


Accepted Solutions
Solution
2 weeks ago
Super User
Posts: 10,133

Re: merge two data sets with multiple records

Posted in reply to elias_2020

How about this one ?

 

data diagnosed;
   input  id   diag_date mmddyy10.  disease $;
   format diag_date mmddyy10.;
   datalines;
1 05/23/10  C
1 05/26/10  C
1 01/01/11  B
;

data discharge;
   input  id   dis_date mmddyy10.  disease $ hospital$;
   format dis_date mmddyy10.;
   datalines;
1 05/21/10  C    W
1 05/24/10  C    N
1 05/25/10  C    G
1 05/27/10  A    R
1 05/29/10  C    R
1 05/28/10  B    W
1 09/15/10  D    S
1 12/29/10  C    F
1 12/31/10  A    F
1 01/02/11  B    S
1 01/05/11  B    N
;
proc sql;
select a.*,b.dis_date,b.hospital
 from diagnosed as a ,discharge as b
  where a.id=b.id and  b.dis_date-a.diag_date>0
   group by a.id,a.diag_date,a.disease
    having b.dis_date-a.diag_date=min(b.dis_date-a.diag_date);
quit;

View solution in original post


All Replies
Super User
Posts: 20,252

Re: merge two data sets with multiple records

Posted in reply to elias_2020

Search on here. You’ll find plenty of examples. 

 

You can join on nearest using SQL easier than you can via a data step. 

 

If your data is not large, you can do a cross join and filter on anything with the date > date2. Then use a data step to pick the lowest/best match. 

Occasional Contributor
Posts: 15

Re: merge two data sets with multiple records

[ Edited ]

Thank you Reeza.

My data sets is too large and I would like to merge the two data set if the type of disease for both data sets are the same and the date should be the nearest date.

 

Respected Advisor
Posts: 4,978

Re: merge two data sets with multiple records

Posted in reply to elias_2020

How about like this:

 

data diagnosed;
   input  id   diag_date mmddyy10.  disease $;
   format diag_date mmddyy10.;
   datalines;
1 05/23/10  C
1 05/26/10  C
1 01/01/11  B
;

data discharge;
   input  id   dis_date mmddyy10.  disease $ hospital$;
   format dis_date mmddyy10.;
   datalines;
1 05/21/10  C    W
1 05/24/10  C    N
1 05/25/10  C    G
1 05/27/10  A    R
1 05/29/10  C    R
1 05/28/10  B    W
1 09/15/10  D    S
1 12/29/10  C    F
1 12/31/10  A    F
1 01/02/11  B    S
1 01/05/11  B    N
;

proc sort data=diagnosed; by id disease diag_date; run;
proc sort data=discharge; by id disease dis_date; run;

data all;
retain diag_date;
merge 
    diagnosed(in=ina rename=diag_date=date) 
    discharge(in=inb rename=dis_date=date);
by id disease date;
if first.disease then call missing(diag_date);
if ina then diag_date = date;
if inb and not missing(diag_date) then do;
    dis_date=date;
    output;
    call missing(diag_date);
    end;
format diag_date dis_date mmddyy10.;
drop date;
run;

proc print noobs; run;
PG
Trusted Advisor
Posts: 1,072

Re: merge two data sets with multiple records

I think this is a case where "set ... by ..." is simpler than "merge ... by ...",

 

data diagonsed;
  input  id   diag_date mmddyy10.  disease $;
  format diag_date mmddyy10.;
  datalines;
1 05/23/10  C
1 05/26/10  C
1 01/01/11  B
run;
data disharge;
  input  id   dis_date mmddyy10.  disease $ hospital$;
  format dis_date mmddyy10.;
  datalines;
1 05/21/10  C    W
1 05/24/10  C    N
1 05/25/10  C    G
1 05/27/10  A    R
1 05/29/10  C    R
1 05/28/10  B    W
1 09/15/10  D    S
1 12/29/10  C    F
1 12/31/10  A    F
1 01/02/11  B    S
1 01/05/11  B    N
run;

proc sort data=diagonsed; by id disease diag_date;run;
proc sort data=disharge; by id disease dis_date;run;


data want (rename=(date=dis_date));
  set diagonsed (in=indiag rename=(diag_date=date))
      disharge  (in=indis  rename=(dis_date=date));
  by id disease date;

  diag_date=lag(date); 
  format diag_date mmddyy10.;
  if indis=1 and lag(indiag)=1 and first.disease=0;
run;

 

The subsetting IF at the bottom does most of the work.  It only keeps cases in which a discharge record immediately follows a diagnosis record.  Adding the "and first.disease=0" condition ensures that the consecutive records are for the same id/disease.

 

While the "diag_date=lag(date)" sometimes will return a discharge date, all such unwanted cases are filtered out in the subsequent subsetting if.

Solution
2 weeks ago
Super User
Posts: 10,133

Re: merge two data sets with multiple records

Posted in reply to elias_2020

How about this one ?

 

data diagnosed;
   input  id   diag_date mmddyy10.  disease $;
   format diag_date mmddyy10.;
   datalines;
1 05/23/10  C
1 05/26/10  C
1 01/01/11  B
;

data discharge;
   input  id   dis_date mmddyy10.  disease $ hospital$;
   format dis_date mmddyy10.;
   datalines;
1 05/21/10  C    W
1 05/24/10  C    N
1 05/25/10  C    G
1 05/27/10  A    R
1 05/29/10  C    R
1 05/28/10  B    W
1 09/15/10  D    S
1 12/29/10  C    F
1 12/31/10  A    F
1 01/02/11  B    S
1 01/05/11  B    N
;
proc sql;
select a.*,b.dis_date,b.hospital
 from diagnosed as a ,discharge as b
  where a.id=b.id and  b.dis_date-a.diag_date>0
   group by a.id,a.diag_date,a.disease
    having b.dis_date-a.diag_date=min(b.dis_date-a.diag_date);
quit;
Occasional Contributor
Posts: 15

Re: merge two data sets with multiple records

Thanks Ksharp.

 

It works with me and I tried it for some cases and it gave me the correct answer.

 

☑ This topic is solved.

Need further help from the community? Please ask a new question.

Discussion stats
  • 6 replies
  • 177 views
  • 1 like
  • 5 in conversation