BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
shihabur
Obsidian | Level 7

I have ten data sets and all of them has customer_id variable in common. I know if we have three data sets we can do something like this

 

proc sql;
create table example as
select a.ID, b.*, c.* from
temp a full join temp2 b
on a.id = b.id
full join temp3 c
on a.id = c.id;
quit;

But for ten or more dataset it will get really complex. So is there any other efficient way to do this ?

1 ACCEPTED SOLUTION

Accepted Solutions
LinusH
Tourmaline | Level 20

17, 18, 19, 20 & 22 is missing just in the idno column, because you are omitting the i.id and j.id in your coalesce argument.

SQL doen't necessarily output in sorted order, even on the join column.

Add an ORDER BY clause to have in the order you want.

Data never sleeps

View solution in original post

6 REPLIES 6
Reeza
Super User

Try a data step merge instead?

shihabur
Obsidian | Level 7

Despite having to sort every data set, merging is providing the desired result for me. Below is the code I tried to compare results between proc sql and data step merging.

 

/*created 10 data sets*/

data temp1;
input id x1 x2;
cards;
1 25 37
2 35 .
3 44 97
;
run;

data temp2;
input id var1 var2;
cards;
2 65 37
3 .  47
5 34 97
;
run;

data temp3;
input id xx1 xx2;
cards;
3 55 37
5 25 47
4 .  97
;
run;

data temp4;
input id var3 var4;
cards;
6 25 37
7 35 47
9 .  .
;
run;

data temp5;
input id var5 var6;
cards;
5 65 37
6 85 47
9 34 97
;
run;

data temp6;
input id xx3 xx4;
cards;
11 55 37
13 25 47
14 64 97
;
run;

data temp7;
input id xx5 xx6;
cards;
11 25 .
15 35 47
16 44 97
;
run;

data temp8;
input id x3 x4;
cards;
12 65 .
15 85 47
16 .  97
;
run;

data temp9;
input id x5 x6;
cards;
17 55 37
18 25 47
19 64 97
;
run;

data temp10;
input id x7 x8;
cards;
17 .  .
20 25 37
22 24 97
;
run;

/*not including the proc sort here */

data new;
merge temp1 temp2 temp3 temp4 temp5 temp6 temp7 temp8 temp9 temp10;
by id;
run;

proc print data = new;
title 'merged 10 data sets';
run;

The above code produces below output, which seems like what I want.

Screen Shot 2017-10-17 at 1.44.43 PM.png

But the PROC SQL output looks messed up. I ran the same code for joining 3 tables and then the output was fine. But for 10 tables the output looks very weird. I am just experimenting with proc sql and want to know how to produce similar output like the data step merge.

 

title "proc sql full join";
proc sql;
select coalesce(a.id, b.id, c.id, d.id, e.id, f.id, g.id, h.id) as idno, a.*, b.*, c.*, d.*, e.*, f.*, g.*, h.* , i.*, j.*
from temp1 a Full Join temp2 b on a.id = b.id
Full Join temp3 c on b.id = c.id
Full Join temp4 d on c.id = d.id
Full Join temp5 e on d.id = e.id
Full Join temp6 f on e.id = f.id
Full Join temp7 g on f.id = g.id
Full Join temp8 h on g.id = h.id
Full Join temp9 i on h.id = i.id
Full Join temp10 j on i.id = j.id;
quit;

and the output is below.The idno column is not sorted in ascending order and it is also omitting some of the id numbers from 17,18,19,20,22 . Also idno 5 is repeated twice. Can you please tell me what's wrong here ?

 

Screen Shot 2017-10-17 at 1.51.33 PM.png

LinusH
Tourmaline | Level 20

17, 18, 19, 20 & 22 is missing just in the idno column, because you are omitting the i.id and j.id in your coalesce argument.

SQL doen't necessarily output in sorted order, even on the join column.

Add an ORDER BY clause to have in the order you want.

Data never sleeps
shihabur
Obsidian | Level 7

Thanks a lot. Made those changes (adding i.id and j.id in the coalesce and adding order by) and the output looks much better. Only one question, why idno 5 is coming up twice ?

 

Screen Shot 2017-10-17 at 2.29.27 PM.png

LinusH
Tourmaline | Level 20

Please define "efficient"? It sounds like you mean less complex. And yes, it might be easier with a data step merge as @Reeza suggests. Bare in mind you might need to pre-sort the data sets - all need to be sorted on the BY variable which is not a requirement for SQL joins).

Data never sleeps
shihabur
Obsidian | Level 7

@LinusHyou are right. I meant less complex.

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 6 replies
  • 694 views
  • 0 likes
  • 3 in conversation