Solved: proc sql join multiple datasets

zihdonv19 · Posted 04-23-2024 03:29 PM

I have several datasets that I want to use proc sql to join. but here to simplify, I use 3 datasets as example:

data dat1;
input id x1;
cards;
1 1
2 0
3 0
;
run;

data dat2;
input id x2;
cards;
2 1 
3 1 
5 0 
;
run;

data dat3;
input id x3;
cards;
3 0 
4 1 
6 0 
;
run;

I want to merge them based one id, with one id per row, the output should be something like this:

id	x1	x2	x3
1	1	.	.
2	0	1	.
3	0	1	0
4	.	.	1
5	.	0	.
6	.	.	0

Tom · Posted 04-23-2024 04:08 PM

Why would you use SQL for that? It is trivial with normal SAS code.

data want;
  merge dat1-dat3;
  by id;
run;

Result:

Obs    id    x1    x2    x3

 1      1     1     .     .
 2      2     0     1     .
 3      3     0     1     0
 4      4     .     .     1
 5      5     .     0     .
 6      6     .     .     0

If you did need to do it in PROC SQL code then the easiest is to use NATURAL joins.

proc sql;
create table want as
select * 
from dat2 
natural full join dat3
natural full join dat1 
order by id
;
quit;

(I referenced the datasets in that strange order so that X1 to X3 get added to the dataset in a nicer order so I could use * as the variable list.)

If you don't use NATURAL joins you will have to use the COALESCE() function to get the resulting ID variable and spell out the join criteria.

proc sql;
create table want as
select coalesce(a.id,b.id,c.id) as id,x1,x2,x3 
from dat1 a
full join dat2 b on a.id=b.id
full join dat3 c on a.id=c.id or b.id=c.id 
order by 1
;
quit;

View solution in original post

SASJedi · Posted 04-23-2024 03:58 PM

Your data is already sorted by ID, so this will do the trick:

data want;
	merge dat1 dat2 dat3;
	by id;
run;

Check out my Jedi SAS Tricks for SAS Users

ballardw · Posted 04-23-2024 04:07 PM

A very important question to ask is are the values of ID duplicated in any of the sets? If so, how do you want the result to appear?

If your data is sorted (as implied by the example data sets) by ID and none of the ID are duplicated within a single set then a Data step merge is going to be way easier:

data want;
   merge dat1 dat2 dat3;
   by id;
run;

The question about duplicates is very important because SQL joins are likely to result in multiple output rows.

An example

data dat1;
input id x1;
cards;
1 1
2 0
2 1
;
run;

data dat2;
input id x2;
cards;
2 1 
2 3 
5 0 
;
run;

proc sql;
   create table example as
   select a.id,a.x1,b.x2
   from dat1 as a
        left join 
        dat2 as b
        on a.id=b.id
   ;
quit;

Note that the two values from Id=2 are combined with both Id=2 in the other set resulting in 4 id=2 observations (and yes, this example does get the Id=5 from set 2 as that is another headache in SQL)

Tom · Posted 04-23-2024 04:08 PM

Why would you use SQL for that? It is trivial with normal SAS code.

data want;
  merge dat1-dat3;
  by id;
run;

Result:

Obs    id    x1    x2    x3

 1      1     1     .     .
 2      2     0     1     .
 3      3     0     1     0
 4      4     .     .     1
 5      5     .     0     .
 6      6     .     .     0

If you did need to do it in PROC SQL code then the easiest is to use NATURAL joins.

proc sql;
create table want as
select * 
from dat2 
natural full join dat3
natural full join dat1 
order by id
;
quit;

(I referenced the datasets in that strange order so that X1 to X3 get added to the dataset in a nicer order so I could use * as the variable list.)

If you don't use NATURAL joins you will have to use the COALESCE() function to get the resulting ID variable and spell out the join criteria.

proc sql;
create table want as
select coalesce(a.id,b.id,c.id) as id,x1,x2,x3 
from dat1 a
full join dat2 b on a.id=b.id
full join dat3 c on a.id=c.id or b.id=c.id 
order by 1
;
quit;

proc sql join multiple datasets

Re: proc sql join multiple datasets

Re: proc sql join multiple datasets

Re: proc sql join multiple datasets

Re: proc sql join multiple datasets

proc sql join multiple datasets

Re: proc sql join multiple datasets

Re: proc sql join multiple datasets

Re: proc sql join multiple datasets

Re: proc sql join multiple datasets

SAS Innovate 2025: Call for Content

Click image to register for webinar

Classroom Training Available!