full join proc sql

Reply
Occasional Contributor
Posts: 8

full join proc sql

Hi,

 

I find that in proc sql the "full join" of 2 sets depends on the order. I did the following experiement:

 

data a;

do i=1 to 2;

x="a";

output;

end;

data b;

do i=2 to 3;

y="b";

output;

end;

proc sql;

create table a_full_join_b as

select * from a full join b

on a.i=b.i;

run;

proc sql;

create table b_full_join_a as

select * from b full join a

on a.i=b.i;

run;

 

The results are as blow:

 

a_full_join_b:

 

11a 
22ab
3. b

 

b_full_join_a:

 

1. a
22ba
33b 

 

What I want is:

11a 
22ab
33 b

 

How I get it with proc sql?

 

I know that in this particular case we can use merge. but my real problem is many to many. merge does not work well.

 

Thanks a lot.

 

Qinghe

 

 

 

 

 

Super User
Posts: 19,878

Re: full join proc sql

Would update or modify work as desired?

You say you have many to many so without that example it's impossible to provide suggestions. Based on your current post this answers your question, but I suspect it wouldn't work on your full data set. Please ensure your test data will cover your actual data.

 


qinghe wrote:

Hi,

 

I find that in proc sql the "full join" of 2 sets depends on the order. I did the following experiement:

 

data a;

do i=1 to 2;

x="a";

output;

end;

data b;

do i=2 to 3;

y="b";

output;

end;

proc sql;

create table a_full_join_b as

select * from a full join b

on a.i=b.i;

run;

proc sql;

create table b_full_join_a as

select * from b full join a

on a.i=b.i;

run;

 

The results are as blow:

 

a_full_join_b:

 

1 1 a  
2 2 a b
3 .   b

 

b_full_join_a:

 

1 .   a
2 2 b a
3 3 b  

 

What I want is:

1 1 a  
2 2 a b
3 3   b

 

How I get it with proc sql?

 

I know that in this particular case we can use merge. but my real problem is many to many. merge does not work well.

 

Thanks a lot.

 

Qinghe

 

 

 

 

 


 

data have;
update a b;
by i;
run;

Results:

Obs i x y
1 1 a
2 2 a b
3 3    b

Super User
Posts: 5,518

Re: full join proc sql

You will need to be more explicit about what to SELECT.  I can't test it now, but I think this should do it:

 

create table combined as select coalesce(a.id, b.id) as id, x, y from   /* either way should work */

Super User
Posts: 19,878

Re: full join proc sql

Posted in reply to Astounding

@Astounding solution works. Note that you end a Data Step with a RUN; and a PROC SQL with a QUIT;

 

proc sql;
create table a_full_join_b as
select coalesce(a.i, b.i) as I, x, y 
from b
full join a on a.i=b.i; quit;
Super User
Super User
Posts: 7,083

Re: full join proc sql

Perhaps you can take advantage of the NATURAL joins?  As long as all of the common variables form the keys for the join it should do what you want.

 

data a;
 input i x $;
cards;
1 a
2 a
;
data b ;
 input i y $;
cards;
2 b
3 b
;
proc sql ;
  create table want as
  select *
  from b natural full join a
  ;
quit;
Obs    i    x    y

 1     1    a
 2     2    a    b
 3     3         b
Super User
Posts: 7,868

Re: full join proc sql

When you run your code, you get a warning from the SQL that i is already present in the output dataset.

24         proc sql;
25         create table a_full_join_b as
26         select * from a full join b
27         on a.i=b.i;
WARNING: Variable i existiert bereits in Datei WORK.A_FULL_JOIN_B.
NOTE: Table WORK.A_FULL_JOIN_B created, with 3 rows and 3 columns.

28         run;
NOTE: PROC SQL statements are executed immediately; The RUN statement has no effect.

(Don't mind the partly German messages. The localization of SAS can be funny at times.)

This is a consequence of your indiscriminate use of the asterisk in the select, and it also points to your problem:

There can only be one "i" variable in the select, so only the first one encountered is taken; therefore the order in which the datasets are joined becomes important.

 

Follow Maxim 25 (Have a clean log) and get rid of the warning, and use the proper SQL function to eliminate your problem:

data a;
do i = 1 to 2;
  x = "a";
  output;
end;
run;

data b;
do i = 2 to 3;
  y = "b";
  output;
end;
run;

proc sql;
create table a_full_join_b as
select
  coalesce(a.i,b.i) as i,
  x,
  y
from a full join b
on a.i = b.i
;
quit;

proc print data=a_full_join_b;
run;

proc sql;
create table b_full_join_a as
select
  coalesce(a.i,b.i) as i,
  x,
  y
from b full join a
on a.i = b.i
;
quit;

proc print data=b_full_join_a;
run;

Result:

Beob.    i    x    y

  1      1    a     
  2      2    a    b
  3      3         b
                    

Beob.    i    x    y

  1      1    a     
  2      2    a    b
  3      3         b

As you can see, the outputs are a) complete and b) identical. And there are no extraneous NOTEs and no WARNINGs in the log.

 

Note that I also used the correct quit; statement to end the proc SQLs.

---------------------------------------------------------------------------------------------
Maxims of Maximally Efficient SAS Programmers
Ask a Question
Discussion stats
  • 5 replies
  • 312 views
  • 5 likes
  • 5 in conversation