Solved: Merge Datasets Together

luvscandy27 · Posted 03-12-2020 06:07 PM

I have two data sets (data below) and I am trying to merge the data together using proc sql code below. When I merge the two
tables I get all of the dates from Id 3 and only the review dates for ID 1 and 2. I have tried to join the two tables using left, right, inner and full join and it seems as if I am getting the same results. What I would like to do is merge by ID and keep all of the dates for each ID.

Can someone assist me with this please?

data test1;
infile datalines delimiter = ',';
input ID START:mmddyy10. REVIEWCOMPLETE: mmddyy10. HEALTHDATE:mmddyy10. HEALTHDATEEND:mmddyy10.
END: mmddyy10. CERTIFYDATE: mmddyy10.;
format START REVIEWCOMPLETE HEALTHDATE HEALTHDATEEND END CERTIFYDATE mmddyy10. ;
datalines;
1, , , , , , ,
2, , , , , , ,
3,01/16/2019, 01/23/2019, 01/26/2019, 01/26/2019, 01/26/2019,01/26/2019,
;
Run;

data test2;
infile datalines delimiter = ',';
input ID START:mmddyy10. REVIEW:mmddyy10. REVIEWCOMPLETE: mmddyy10. HEALTHDATE:mmddyy10. HEALTHDATEEND:mmddyy10.
END: mmddyy10. CERTIFYDATE: mmddyy10.;
format START REVIEW REVIEWCOMPLETE HEALTHDATE HEALTHDATEEND END CERTIFYDATE mmddyy10. ;
datalines;
1,01/17/2020,01/29/2020,01/30/2020, , , , ,
2,01/10/2020,01/24/2020, , , , , ,
3, , , , , , , ,
;
Run;

proc sql;
create table table3 as
select * from test1 as a inner join test2 as b
on a.id = b.id;
quit;

Kurt_Bremser · Posted 03-15-2020 07:29 AM

In SQL, you need to use the coalesce() function. Here is code with both (SQL and data step) solutions, and a check step to make sure both yield the same result:

data test1;
infile datalines delimiter = ',' dsd;
input ID START:mmddyy10. REVIEWCOMPLETE: mmddyy10. HEALTHDATE:mmddyy10. HEALTHDATEEND:mmddyy10.
END: mmddyy10. CERTIFYDATE: mmddyy10.; 
format START REVIEWCOMPLETE HEALTHDATE HEALTHDATEEND END CERTIFYDATE yymmddd10. ;
datalines; 
1, , , , , , , 
2, , , , , , , 
3,01/16/2019, 01/23/2019, 01/26/2019, 01/26/2019, 01/26/2019,01/26/2019, 
;
 
data test2;
infile datalines delimiter = ',' dsd;
input ID START:mmddyy10. REVIEW:mmddyy10. REVIEWCOMPLETE: mmddyy10. HEALTHDATE:mmddyy10. HEALTHDATEEND:mmddyy10.
END: mmddyy10. CERTIFYDATE: mmddyy10.; 
format START REVIEW REVIEWCOMPLETE HEALTHDATE HEALTHDATEEND END CERTIFYDATE yymmddd10. ;
datalines; 
1,01/17/2020,01/29/2020,01/30/2020, , , , ,
2,01/10/2020,01/24/2020, , , , , ,
3, , , , , , , ,
;

proc sql;
create table want1 as
select
  t1.id,
  coalesce(t1.start,t2.start) as start format=yymmddd10.,
  t2.review,
  coalesce(t1.reviewcomplete,t2.reviewcomplete) as reviewcomplete format=yymmddd10.,
  coalesce(t1.healthdate,t2.healthdate) as healthdate format=yymmddd10.,
  coalesce(t1.healthdateend,t2.healthdateend) as healthdateend format=yymmddd10.,
  coalesce(t1.end,t2.end) as end format=yymmddd10.,
  coalesce(t1.certifydate,t2.certifydate) as certifydate format=yymmddd10.
from test1 t1 full join test2 t2
on t1.id = t2.id;
quit;

data want2;
update test1 test2;
by id;
run;

proc compare base=want1 compare=want2;
run;

As you can see, the data step solution (as it is most often in SAS) is the most simple one.

Maxims of Maximally Efficient SAS Programmers
How to convert datasets to data steps
The macro for direct download as ZIP
How to post code
Please vote for Provide Sequential Search Capability for Hash Objects
How to deal with locked files on UNIX

View solution in original post

Kurt_Bremser · Posted 03-12-2020 06:11 PM

Do NOT use the asterisk in sql joins, make an explicit list, and use the coalesce() function to overwrite only the missing values.

Maxims of Maximally Efficient SAS Programmers
How to convert datasets to data steps
The macro for direct download as ZIP
How to post code
Please vote for Provide Sequential Search Capability for Hash Objects
How to deal with locked files on UNIX

luvscandy27 · Posted 03-12-2020 06:38 PM

When I make an explicit list I get the following error. Also, I want to keep the missing dates where missing dates should be therefore I don't think the coalesce function is what I need.

proc sql; 
	create table table3 as
	select ID, START,REVIEW, REVIEWCOMPLETE, HEALTHDATE, HEALTHDATEEND, END, CERTIFYDATE
	from test1 as a inner join test2 as b
	on a.id = b.id;
quit;

ERROR: Ambiguous reference, column ID is in more than one table.

ERROR: Ambiguous reference, column START is in more than one table.

ERROR: Ambiguous reference, column REVIEWCOMPLETE is in more than one table.

ERROR: Ambiguous reference, column HEALTHDATE is in more than one table.

ERROR: Ambiguous reference, column HEALTHDATEEND is in more than one table.

ERROR: Ambiguous reference, column END is in more than one table.

ERROR: Ambiguous reference, column CERTIFYDATE is in more than one table.

NOTE: PROC SQL set option NOEXEC and will continue to check the syntax of statements.

76 quit;

Reeza · Posted 03-12-2020 10:28 PM

You need to specify which variable is coming from where, especially when it's in more than one table.

select a.ID, a.Start, a.reviewComplete etc

Are you sure you want a merge and not an APPEND given the number of variables that are the same in the data set?

ChrisNZ · Posted 03-12-2020 10:55 PM

1. >Also, I want to keep the missing dates where missing dates should be therefore I don't think the coalesce function is what I need.

Coalesce does want you asked in your question.

What does missing dates should be mean?

2. A data step with an update statement might be what you want.

data TABLE3; update TABLE1 TABLE2; by ID; run;

High-Performance SAS Coding - Third Edition

Kurt_Bremser · Posted 03-15-2020 07:29 AM

In SQL, you need to use the coalesce() function. Here is code with both (SQL and data step) solutions, and a check step to make sure both yield the same result:

data test1;
infile datalines delimiter = ',' dsd;
input ID START:mmddyy10. REVIEWCOMPLETE: mmddyy10. HEALTHDATE:mmddyy10. HEALTHDATEEND:mmddyy10.
END: mmddyy10. CERTIFYDATE: mmddyy10.; 
format START REVIEWCOMPLETE HEALTHDATE HEALTHDATEEND END CERTIFYDATE yymmddd10. ;
datalines; 
1, , , , , , , 
2, , , , , , , 
3,01/16/2019, 01/23/2019, 01/26/2019, 01/26/2019, 01/26/2019,01/26/2019, 
;
 
data test2;
infile datalines delimiter = ',' dsd;
input ID START:mmddyy10. REVIEW:mmddyy10. REVIEWCOMPLETE: mmddyy10. HEALTHDATE:mmddyy10. HEALTHDATEEND:mmddyy10.
END: mmddyy10. CERTIFYDATE: mmddyy10.; 
format START REVIEW REVIEWCOMPLETE HEALTHDATE HEALTHDATEEND END CERTIFYDATE yymmddd10. ;
datalines; 
1,01/17/2020,01/29/2020,01/30/2020, , , , ,
2,01/10/2020,01/24/2020, , , , , ,
3, , , , , , , ,
;

proc sql;
create table want1 as
select
  t1.id,
  coalesce(t1.start,t2.start) as start format=yymmddd10.,
  t2.review,
  coalesce(t1.reviewcomplete,t2.reviewcomplete) as reviewcomplete format=yymmddd10.,
  coalesce(t1.healthdate,t2.healthdate) as healthdate format=yymmddd10.,
  coalesce(t1.healthdateend,t2.healthdateend) as healthdateend format=yymmddd10.,
  coalesce(t1.end,t2.end) as end format=yymmddd10.,
  coalesce(t1.certifydate,t2.certifydate) as certifydate format=yymmddd10.
from test1 t1 full join test2 t2
on t1.id = t2.id;
quit;

data want2;
update test1 test2;
by id;
run;

proc compare base=want1 compare=want2;
run;

As you can see, the data step solution (as it is most often in SAS) is the most simple one.

Maxims of Maximally Efficient SAS Programmers
How to convert datasets to data steps
The macro for direct download as ZIP
How to post code
Please vote for Provide Sequential Search Capability for Hash Objects
How to deal with locked files on UNIX

luvscandy27 · Posted 03-16-2020 11:05 AM

Thank you!

Merge Datasets Together

Re: Merge Datasets Together

Re: Merge Datasets Together

Re: Merge Datasets Together

Re: Merge Datasets Together

Re: Merge Datasets Together

Re: Merge Datasets Together

Re: Merge Datasets Together

Ready to join fellow brilliant minds for the SAS Hackathon?

Classroom Training Available!