I have ten data sets and all of them has customer_id variable in common. I know if we have three data sets we can do something like this
proc sql;
create table example as
select a.ID, b.*, c.* from
temp a full join temp2 b
on a.id = b.id
full join temp3 c
on a.id = c.id;
quit;
But for ten or more dataset it will get really complex. So is there any other efficient way to do this ?
17, 18, 19, 20 & 22 is missing just in the idno column, because you are omitting the i.id and j.id in your coalesce argument.
SQL doen't necessarily output in sorted order, even on the join column.
Add an ORDER BY clause to have in the order you want.
Try a data step merge instead?
Despite having to sort every data set, merging is providing the desired result for me. Below is the code I tried to compare results between proc sql and data step merging.
/*created 10 data sets*/
data temp1;
input id x1 x2;
cards;
1 25 37
2 35 .
3 44 97
;
run;
data temp2;
input id var1 var2;
cards;
2 65 37
3 . 47
5 34 97
;
run;
data temp3;
input id xx1 xx2;
cards;
3 55 37
5 25 47
4 . 97
;
run;
data temp4;
input id var3 var4;
cards;
6 25 37
7 35 47
9 . .
;
run;
data temp5;
input id var5 var6;
cards;
5 65 37
6 85 47
9 34 97
;
run;
data temp6;
input id xx3 xx4;
cards;
11 55 37
13 25 47
14 64 97
;
run;
data temp7;
input id xx5 xx6;
cards;
11 25 .
15 35 47
16 44 97
;
run;
data temp8;
input id x3 x4;
cards;
12 65 .
15 85 47
16 . 97
;
run;
data temp9;
input id x5 x6;
cards;
17 55 37
18 25 47
19 64 97
;
run;
data temp10;
input id x7 x8;
cards;
17 . .
20 25 37
22 24 97
;
run;
/*not including the proc sort here */
data new;
merge temp1 temp2 temp3 temp4 temp5 temp6 temp7 temp8 temp9 temp10;
by id;
run;
proc print data = new;
title 'merged 10 data sets';
run;
The above code produces below output, which seems like what I want.
But the PROC SQL output looks messed up. I ran the same code for joining 3 tables and then the output was fine. But for 10 tables the output looks very weird. I am just experimenting with proc sql and want to know how to produce similar output like the data step merge.
title "proc sql full join";
proc sql;
select coalesce(a.id, b.id, c.id, d.id, e.id, f.id, g.id, h.id) as idno, a.*, b.*, c.*, d.*, e.*, f.*, g.*, h.* , i.*, j.*
from temp1 a Full Join temp2 b on a.id = b.id
Full Join temp3 c on b.id = c.id
Full Join temp4 d on c.id = d.id
Full Join temp5 e on d.id = e.id
Full Join temp6 f on e.id = f.id
Full Join temp7 g on f.id = g.id
Full Join temp8 h on g.id = h.id
Full Join temp9 i on h.id = i.id
Full Join temp10 j on i.id = j.id;
quit;
and the output is below.The idno column is not sorted in ascending order and it is also omitting some of the id numbers from 17,18,19,20,22 . Also idno 5 is repeated twice. Can you please tell me what's wrong here ?
17, 18, 19, 20 & 22 is missing just in the idno column, because you are omitting the i.id and j.id in your coalesce argument.
SQL doen't necessarily output in sorted order, even on the join column.
Add an ORDER BY clause to have in the order you want.
Thanks a lot. Made those changes (adding i.id and j.id in the coalesce and adding order by) and the output looks much better. Only one question, why idno 5 is coming up twice ?
Please define "efficient"? It sounds like you mean less complex. And yes, it might be easier with a data step merge as @Reeza suggests. Bare in mind you might need to pre-sort the data sets - all need to be sorted on the BY variable which is not a requirement for SQL joins).
@LinusHyou are right. I meant less complex.
Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.