Hi,
I find that in proc sql the "full join" of 2 sets depends on the order. I did the following experiement:
data a;
do i=1 to 2;
x="a";
output;
end;
data b;
do i=2 to 3;
y="b";
output;
end;
proc sql;
create table a_full_join_b as
select * from a full join b
on a.i=b.i;
run;
proc sql;
create table b_full_join_a as
select * from b full join a
on a.i=b.i;
run;
The results are as blow:
a_full_join_b:
1 | 1 | a | |
2 | 2 | a | b |
3 | . | b |
b_full_join_a:
1 | . | a | |
2 | 2 | b | a |
3 | 3 | b |
What I want is:
1 | 1 | a | |
2 | 2 | a | b |
3 | 3 | b |
How I get it with proc sql?
I know that in this particular case we can use merge. but my real problem is many to many. merge does not work well.
Thanks a lot.
Qinghe
Would update or modify work as desired?
You say you have many to many so without that example it's impossible to provide suggestions. Based on your current post this answers your question, but I suspect it wouldn't work on your full data set. Please ensure your test data will cover your actual data.
@qinghe wrote:
Hi,
I find that in proc sql the "full join" of 2 sets depends on the order. I did the following experiement:
data a;
do i=1 to 2;
x="a";
output;
end;
data b;
do i=2 to 3;
y="b";
output;
end;
proc sql;
create table a_full_join_b as
select * from a full join b
on a.i=b.i;
run;
proc sql;
create table b_full_join_a as
select * from b full join a
on a.i=b.i;
run;
The results are as blow:
a_full_join_b:
1 1 a 2 2 a b 3 . b
b_full_join_a:
1 . a 2 2 b a 3 3 b
What I want is:
1 1 a 2 2 a b 3 3 b
How I get it with proc sql?
I know that in this particular case we can use merge. but my real problem is many to many. merge does not work well.
Thanks a lot.
Qinghe
data have;
update a b;
by i;
run;
Results:
Obs i x y
1 1 a
2 2 a b
3 3 b
You will need to be more explicit about what to SELECT. I can't test it now, but I think this should do it:
create table combined as select coalesce(a.id, b.id) as id, x, y from /* either way should work */
@Astounding solution works. Note that you end a Data Step with a RUN; and a PROC SQL with a QUIT;
proc sql;
create table a_full_join_b as
select coalesce(a.i, b.i) as I, x, y
from b
full join a
on a.i=b.i;
quit;
Perhaps you can take advantage of the NATURAL joins? As long as all of the common variables form the keys for the join it should do what you want.
data a;
input i x $;
cards;
1 a
2 a
;
data b ;
input i y $;
cards;
2 b
3 b
;
proc sql ;
create table want as
select *
from b natural full join a
;
quit;
Obs i x y 1 1 a 2 2 a b 3 3 b
When you run your code, you get a warning from the SQL that i is already present in the output dataset.
24 proc sql; 25 create table a_full_join_b as 26 select * from a full join b 27 on a.i=b.i; WARNING: Variable i existiert bereits in Datei WORK.A_FULL_JOIN_B. NOTE: Table WORK.A_FULL_JOIN_B created, with 3 rows and 3 columns. 28 run; NOTE: PROC SQL statements are executed immediately; The RUN statement has no effect.
(Don't mind the partly German messages. The localization of SAS can be funny at times.)
This is a consequence of your indiscriminate use of the asterisk in the select, and it also points to your problem:
There can only be one "i" variable in the select, so only the first one encountered is taken; therefore the order in which the datasets are joined becomes important.
Follow Maxim 25 (Have a clean log) and get rid of the warning, and use the proper SQL function to eliminate your problem:
data a;
do i = 1 to 2;
x = "a";
output;
end;
run;
data b;
do i = 2 to 3;
y = "b";
output;
end;
run;
proc sql;
create table a_full_join_b as
select
coalesce(a.i,b.i) as i,
x,
y
from a full join b
on a.i = b.i
;
quit;
proc print data=a_full_join_b;
run;
proc sql;
create table b_full_join_a as
select
coalesce(a.i,b.i) as i,
x,
y
from b full join a
on a.i = b.i
;
quit;
proc print data=b_full_join_a;
run;
Result:
Beob. i x y 1 1 a 2 2 a b 3 3 b Beob. i x y 1 1 a 2 2 a b 3 3 b
As you can see, the outputs are a) complete and b) identical. And there are no extraneous NOTEs and no WARNINGs in the log.
Note that I also used the correct quit; statement to end the proc SQLs.
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.