How to combine ten data sets using PROC SQL efficiently

Accepted Solution Solved
Reply
Contributor
Posts: 38
Accepted Solution

How to combine ten data sets using PROC SQL efficiently

I have ten data sets and all of them has customer_id variable in common. I know if we have three data sets we can do something like this

 

proc sql;
create table example as
select a.ID, b.*, c.* from
temp a full join temp2 b
on a.id = b.id
full join temp3 c
on a.id = c.id;
quit;

But for ten or more dataset it will get really complex. So is there any other efficient way to do this ?


Accepted Solutions
Solution
‎10-17-2017 05:41 PM
Super User
Posts: 5,849

Re: How to combine ten data sets using PROC SQL efficiently

17, 18, 19, 20 & 22 is missing just in the idno column, because you are omitting the i.id and j.id in your coalesce argument.

SQL doen't necessarily output in sorted order, even on the join column.

Add an ORDER BY clause to have in the order you want.

Data never sleeps

View solution in original post


All Replies
Super User
Posts: 23,224

Re: How to combine ten data sets using PROC SQL efficiently

Try a data step merge instead?

Contributor
Posts: 38

Re: How to combine ten data sets using PROC SQL efficiently

[ Edited ]

Despite having to sort every data set, merging is providing the desired result for me. Below is the code I tried to compare results between proc sql and data step merging.

 

/*created 10 data sets*/

data temp1;
input id x1 x2;
cards;
1 25 37
2 35 .
3 44 97
;
run;

data temp2;
input id var1 var2;
cards;
2 65 37
3 .  47
5 34 97
;
run;

data temp3;
input id xx1 xx2;
cards;
3 55 37
5 25 47
4 .  97
;
run;

data temp4;
input id var3 var4;
cards;
6 25 37
7 35 47
9 .  .
;
run;

data temp5;
input id var5 var6;
cards;
5 65 37
6 85 47
9 34 97
;
run;

data temp6;
input id xx3 xx4;
cards;
11 55 37
13 25 47
14 64 97
;
run;

data temp7;
input id xx5 xx6;
cards;
11 25 .
15 35 47
16 44 97
;
run;

data temp8;
input id x3 x4;
cards;
12 65 .
15 85 47
16 .  97
;
run;

data temp9;
input id x5 x6;
cards;
17 55 37
18 25 47
19 64 97
;
run;

data temp10;
input id x7 x8;
cards;
17 .  .
20 25 37
22 24 97
;
run;

/*not including the proc sort here */

data new;
merge temp1 temp2 temp3 temp4 temp5 temp6 temp7 temp8 temp9 temp10;
by id;
run;

proc print data = new;
title 'merged 10 data sets';
run;

The above code produces below output, which seems like what I want.

Screen Shot 2017-10-17 at 1.44.43 PM.png

But the PROC SQL output looks messed up. I ran the same code for joining 3 tables and then the output was fine. But for 10 tables the output looks very weird. I am just experimenting with proc sql and want to know how to produce similar output like the data step merge.

 

title "proc sql full join";
proc sql;
select coalesce(a.id, b.id, c.id, d.id, e.id, f.id, g.id, h.id) as idno, a.*, b.*, c.*, d.*, e.*, f.*, g.*, h.* , i.*, j.*
from temp1 a Full Join temp2 b on a.id = b.id
Full Join temp3 c on b.id = c.id
Full Join temp4 d on c.id = d.id
Full Join temp5 e on d.id = e.id
Full Join temp6 f on e.id = f.id
Full Join temp7 g on f.id = g.id
Full Join temp8 h on g.id = h.id
Full Join temp9 i on h.id = i.id
Full Join temp10 j on i.id = j.id;
quit;

and the output is below.The idno column is not sorted in ascending order and it is also omitting some of the id numbers from 17,18,19,20,22 . Also idno 5 is repeated twice. Can you please tell me what's wrong here ?

 

Screen Shot 2017-10-17 at 1.51.33 PM.png

Solution
‎10-17-2017 05:41 PM
Super User
Posts: 5,849

Re: How to combine ten data sets using PROC SQL efficiently

17, 18, 19, 20 & 22 is missing just in the idno column, because you are omitting the i.id and j.id in your coalesce argument.

SQL doen't necessarily output in sorted order, even on the join column.

Add an ORDER BY clause to have in the order you want.

Data never sleeps
Contributor
Posts: 38

Re: How to combine ten data sets using PROC SQL efficiently

Thanks a lot. Made those changes (adding i.id and j.id in the coalesce and adding order by) and the output looks much better. Only one question, why idno 5 is coming up twice ?

 

Screen Shot 2017-10-17 at 2.29.27 PM.png

Super User
Posts: 5,849

Re: How to combine ten data sets using PROC SQL efficiently

Please define "efficient"? It sounds like you mean less complex. And yes, it might be easier with a data step merge as @Reeza suggests. Bare in mind you might need to pre-sort the data sets - all need to be sorted on the BY variable which is not a requirement for SQL joins).

Data never sleeps
Contributor
Posts: 38

Re: How to combine ten data sets using PROC SQL efficiently

@LinusHyou are right. I meant less complex.

☑ This topic is solved.

Need further help from the community? Please ask a new question.

Discussion stats
  • 6 replies
  • 142 views
  • 0 likes
  • 3 in conversation