I have several datasets that I want to use proc sql to join. but here to simplify, I use 3 datasets as example:
data dat1;
input id x1;
cards;
1 1
2 0
3 0
;
run;
data dat2;
input id x2;
cards;
2 1
3 1
5 0
;
run;
data dat3;
input id x3;
cards;
3 0
4 1
6 0
;
run;
I want to merge them based one id, with one id per row, the output should be something like this:
id | x1 | x2 | x3 |
1 | 1 | . | . |
2 | 0 | 1 | . |
3 | 0 | 1 | 0 |
4 | . | . | 1 |
5 | . | 0 | . |
6 | . | . | 0 |
Why would you use SQL for that? It is trivial with normal SAS code.
data want;
merge dat1-dat3;
by id;
run;
Result:
Obs id x1 x2 x3 1 1 1 . . 2 2 0 1 . 3 3 0 1 0 4 4 . . 1 5 5 . 0 . 6 6 . . 0
If you did need to do it in PROC SQL code then the easiest is to use NATURAL joins.
proc sql;
create table want as
select *
from dat2
natural full join dat3
natural full join dat1
order by id
;
quit;
(I referenced the datasets in that strange order so that X1 to X3 get added to the dataset in a nicer order so I could use * as the variable list.)
If you don't use NATURAL joins you will have to use the COALESCE() function to get the resulting ID variable and spell out the join criteria.
proc sql;
create table want as
select coalesce(a.id,b.id,c.id) as id,x1,x2,x3
from dat1 a
full join dat2 b on a.id=b.id
full join dat3 c on a.id=c.id or b.id=c.id
order by 1
;
quit;
Your data is already sorted by ID, so this will do the trick:
data want;
merge dat1 dat2 dat3;
by id;
run;
A very important question to ask is are the values of ID duplicated in any of the sets? If so, how do you want the result to appear?
If your data is sorted (as implied by the example data sets) by ID and none of the ID are duplicated within a single set then a Data step merge is going to be way easier:
data want; merge dat1 dat2 dat3; by id; run;
The question about duplicates is very important because SQL joins are likely to result in multiple output rows.
An example
data dat1; input id x1; cards; 1 1 2 0 2 1 ; run; data dat2; input id x2; cards; 2 1 2 3 5 0 ; run; proc sql; create table example as select a.id,a.x1,b.x2 from dat1 as a left join dat2 as b on a.id=b.id ; quit;
Note that the two values from Id=2 are combined with both Id=2 in the other set resulting in 4 id=2 observations (and yes, this example does get the Id=5 from set 2 as that is another headache in SQL)
Why would you use SQL for that? It is trivial with normal SAS code.
data want;
merge dat1-dat3;
by id;
run;
Result:
Obs id x1 x2 x3 1 1 1 . . 2 2 0 1 . 3 3 0 1 0 4 4 . . 1 5 5 . 0 . 6 6 . . 0
If you did need to do it in PROC SQL code then the easiest is to use NATURAL joins.
proc sql;
create table want as
select *
from dat2
natural full join dat3
natural full join dat1
order by id
;
quit;
(I referenced the datasets in that strange order so that X1 to X3 get added to the dataset in a nicer order so I could use * as the variable list.)
If you don't use NATURAL joins you will have to use the COALESCE() function to get the resulting ID variable and spell out the join criteria.
proc sql;
create table want as
select coalesce(a.id,b.id,c.id) as id,x1,x2,x3
from dat1 a
full join dat2 b on a.id=b.id
full join dat3 c on a.id=c.id or b.id=c.id
order by 1
;
quit;
Join us for SAS Innovate 2025, our biggest and most exciting global event of the year, in Orlando, FL, from May 6-9.
Lock in the best rate now before the price increases on April 1.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.