BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
luvscandy27
Quartz | Level 8
 
I have two data sets (data below) and I am trying to merge the data together using proc sql code below. When I merge the two
tables I get all of the dates from Id 3 and only the review dates for ID 1 and 2. I have tried to join the two tables using left, right, inner and full join and it seems as if I am getting the same results. What I would like to do is merge by ID and keep all of the dates for each ID.

Can someone assist me with this please?
 

data test1;
infile datalines delimiter = ',';
input ID START:mmddyy10. REVIEWCOMPLETE: mmddyy10. HEALTHDATE:mmddyy10. HEALTHDATEEND:mmddyy10.
END: mmddyy10. CERTIFYDATE: mmddyy10.;
format START REVIEWCOMPLETE HEALTHDATE HEALTHDATEEND END CERTIFYDATE mmddyy10. ;
datalines;
1, , , , , , ,
2, , , , , , ,
3,01/16/2019, 01/23/2019, 01/26/2019, 01/26/2019, 01/26/2019,01/26/2019,
;
Run;

 


data test2;
infile datalines delimiter = ',';
input ID START:mmddyy10. REVIEW:mmddyy10. REVIEWCOMPLETE: mmddyy10. HEALTHDATE:mmddyy10. HEALTHDATEEND:mmddyy10.
END: mmddyy10. CERTIFYDATE: mmddyy10.;
format START REVIEW REVIEWCOMPLETE HEALTHDATE HEALTHDATEEND END CERTIFYDATE mmddyy10. ;
datalines;
1,01/17/2020,01/29/2020,01/30/2020, , , , ,
2,01/10/2020,01/24/2020, , , , , ,
3, , , , , , , ,
;
Run;


proc sql;
create table table3 as
select * from test1 as a inner join test2 as b
on a.id = b.id;
quit;

1 ACCEPTED SOLUTION

Accepted Solutions
Kurt_Bremser
Super User

In SQL, you need to use the coalesce() function. Here is code with both (SQL and data step) solutions, and a check step to make sure both yield the same result:

data test1;
infile datalines delimiter = ',' dsd;
input ID START:mmddyy10. REVIEWCOMPLETE: mmddyy10. HEALTHDATE:mmddyy10. HEALTHDATEEND:mmddyy10.
END: mmddyy10. CERTIFYDATE: mmddyy10.; 
format START REVIEWCOMPLETE HEALTHDATE HEALTHDATEEND END CERTIFYDATE yymmddd10. ;
datalines; 
1, , , , , , , 
2, , , , , , , 
3,01/16/2019, 01/23/2019, 01/26/2019, 01/26/2019, 01/26/2019,01/26/2019, 
;
 
data test2;
infile datalines delimiter = ',' dsd;
input ID START:mmddyy10. REVIEW:mmddyy10. REVIEWCOMPLETE: mmddyy10. HEALTHDATE:mmddyy10. HEALTHDATEEND:mmddyy10.
END: mmddyy10. CERTIFYDATE: mmddyy10.; 
format START REVIEW REVIEWCOMPLETE HEALTHDATE HEALTHDATEEND END CERTIFYDATE yymmddd10. ;
datalines; 
1,01/17/2020,01/29/2020,01/30/2020, , , , ,
2,01/10/2020,01/24/2020, , , , , ,
3, , , , , , , ,
;

proc sql;
create table want1 as
select
  t1.id,
  coalesce(t1.start,t2.start) as start format=yymmddd10.,
  t2.review,
  coalesce(t1.reviewcomplete,t2.reviewcomplete) as reviewcomplete format=yymmddd10.,
  coalesce(t1.healthdate,t2.healthdate) as healthdate format=yymmddd10.,
  coalesce(t1.healthdateend,t2.healthdateend) as healthdateend format=yymmddd10.,
  coalesce(t1.end,t2.end) as end format=yymmddd10.,
  coalesce(t1.certifydate,t2.certifydate) as certifydate format=yymmddd10.
from test1 t1 full join test2 t2
on t1.id = t2.id;
quit;

data want2;
update test1 test2;
by id;
run;

proc compare base=want1 compare=want2;
run;

As you can see, the data step solution (as it is most often in SAS) is the most simple one.

View solution in original post

6 REPLIES 6
luvscandy27
Quartz | Level 8

When I make an explicit list I get the following error. Also, I want to keep the missing dates where missing dates should be therefore I don't think the coalesce function is what I need.

 

proc sql; 
create table table3 as
select ID, START,REVIEW, REVIEWCOMPLETE, HEALTHDATE, HEALTHDATEEND, END, CERTIFYDATE
from test1 as a inner join test2 as b
on a.id = b.id;
quit;

ERROR: Ambiguous reference, column ID is in more than one table.
ERROR: Ambiguous reference, column START is in more than one table.
ERROR: Ambiguous reference, column REVIEWCOMPLETE is in more than one table.
ERROR: Ambiguous reference, column HEALTHDATE is in more than one table.
ERROR: Ambiguous reference, column HEALTHDATEEND is in more than one table.
ERROR: Ambiguous reference, column END is in more than one table.
ERROR: Ambiguous reference, column CERTIFYDATE is in more than one table.
NOTE: PROC SQL set option NOEXEC and will continue to check the syntax of statements.
76 quit;
 
 
Reeza
Super User
You need to specify which variable is coming from where, especially when it's in more than one table.

select a.ID, a.Start, a.reviewComplete etc

Are you sure you want a merge and not an APPEND given the number of variables that are the same in the data set?
ChrisNZ
Tourmaline | Level 20

1. >Also, I want to keep the missing dates where missing dates should be therefore I don't think the coalesce function is what I need.

Coalesce does want you asked in your question. 

What does   missing dates should be    mean?

 

2. A data step with an update statement might be what you want.

data TABLE3; update TABLE1 TABLE2; by ID; run;

 

 

Kurt_Bremser
Super User

In SQL, you need to use the coalesce() function. Here is code with both (SQL and data step) solutions, and a check step to make sure both yield the same result:

data test1;
infile datalines delimiter = ',' dsd;
input ID START:mmddyy10. REVIEWCOMPLETE: mmddyy10. HEALTHDATE:mmddyy10. HEALTHDATEEND:mmddyy10.
END: mmddyy10. CERTIFYDATE: mmddyy10.; 
format START REVIEWCOMPLETE HEALTHDATE HEALTHDATEEND END CERTIFYDATE yymmddd10. ;
datalines; 
1, , , , , , , 
2, , , , , , , 
3,01/16/2019, 01/23/2019, 01/26/2019, 01/26/2019, 01/26/2019,01/26/2019, 
;
 
data test2;
infile datalines delimiter = ',' dsd;
input ID START:mmddyy10. REVIEW:mmddyy10. REVIEWCOMPLETE: mmddyy10. HEALTHDATE:mmddyy10. HEALTHDATEEND:mmddyy10.
END: mmddyy10. CERTIFYDATE: mmddyy10.; 
format START REVIEW REVIEWCOMPLETE HEALTHDATE HEALTHDATEEND END CERTIFYDATE yymmddd10. ;
datalines; 
1,01/17/2020,01/29/2020,01/30/2020, , , , ,
2,01/10/2020,01/24/2020, , , , , ,
3, , , , , , , ,
;

proc sql;
create table want1 as
select
  t1.id,
  coalesce(t1.start,t2.start) as start format=yymmddd10.,
  t2.review,
  coalesce(t1.reviewcomplete,t2.reviewcomplete) as reviewcomplete format=yymmddd10.,
  coalesce(t1.healthdate,t2.healthdate) as healthdate format=yymmddd10.,
  coalesce(t1.healthdateend,t2.healthdateend) as healthdateend format=yymmddd10.,
  coalesce(t1.end,t2.end) as end format=yymmddd10.,
  coalesce(t1.certifydate,t2.certifydate) as certifydate format=yymmddd10.
from test1 t1 full join test2 t2
on t1.id = t2.id;
quit;

data want2;
update test1 test2;
by id;
run;

proc compare base=want1 compare=want2;
run;

As you can see, the data step solution (as it is most often in SAS) is the most simple one.

luvscandy27
Quartz | Level 8
Thank you!

Ready to join fellow brilliant minds for the SAS Hackathon?

Build your skills. Make connections. Enjoy creative freedom. Maybe change the world. Registration is now open through August 30th. Visit the SAS Hackathon homepage.

Register today!
How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 6 replies
  • 613 views
  • 4 likes
  • 4 in conversation