BookmarkSubscribeRSS Feed
qinghe
Calcite | Level 5

Hi,

 

I find that in proc sql the "full join" of 2 sets depends on the order. I did the following experiement:

 

data a;

do i=1 to 2;

x="a";

output;

end;

data b;

do i=2 to 3;

y="b";

output;

end;

proc sql;

create table a_full_join_b as

select * from a full join b

on a.i=b.i;

run;

proc sql;

create table b_full_join_a as

select * from b full join a

on a.i=b.i;

run;

 

The results are as blow:

 

a_full_join_b:

 

11a 
22ab
3. b

 

b_full_join_a:

 

1. a
22ba
33b 

 

What I want is:

11a 
22ab
33 b

 

How I get it with proc sql?

 

I know that in this particular case we can use merge. but my real problem is many to many. merge does not work well.

 

Thanks a lot.

 

Qinghe

 

 

 

 

 

5 REPLIES 5
Reeza
Super User

Would update or modify work as desired?

You say you have many to many so without that example it's impossible to provide suggestions. Based on your current post this answers your question, but I suspect it wouldn't work on your full data set. Please ensure your test data will cover your actual data.

 


@qinghe wrote:

Hi,

 

I find that in proc sql the "full join" of 2 sets depends on the order. I did the following experiement:

 

data a;

do i=1 to 2;

x="a";

output;

end;

data b;

do i=2 to 3;

y="b";

output;

end;

proc sql;

create table a_full_join_b as

select * from a full join b

on a.i=b.i;

run;

proc sql;

create table b_full_join_a as

select * from b full join a

on a.i=b.i;

run;

 

The results are as blow:

 

a_full_join_b:

 

1 1 a  
2 2 a b
3 .   b

 

b_full_join_a:

 

1 .   a
2 2 b a
3 3 b  

 

What I want is:

1 1 a  
2 2 a b
3 3   b

 

How I get it with proc sql?

 

I know that in this particular case we can use merge. but my real problem is many to many. merge does not work well.

 

Thanks a lot.

 

Qinghe

 

 

 

 

 


 

data have;
update a b;
by i;
run;

Results:

Obs i x y
1 1 a
2 2 a b
3 3    b

Astounding
PROC Star

You will need to be more explicit about what to SELECT.  I can't test it now, but I think this should do it:

 

create table combined as select coalesce(a.id, b.id) as id, x, y from   /* either way should work */

Reeza
Super User

@Astounding solution works. Note that you end a Data Step with a RUN; and a PROC SQL with a QUIT;

 

proc sql;
create table a_full_join_b as
select coalesce(a.i, b.i) as I, x, y 
from b
full join a on a.i=b.i; quit;
Tom
Super User Tom
Super User

Perhaps you can take advantage of the NATURAL joins?  As long as all of the common variables form the keys for the join it should do what you want.

 

data a;
 input i x $;
cards;
1 a
2 a
;
data b ;
 input i y $;
cards;
2 b
3 b
;
proc sql ;
  create table want as
  select *
  from b natural full join a
  ;
quit;
Obs    i    x    y

 1     1    a
 2     2    a    b
 3     3         b
Kurt_Bremser
Super User

When you run your code, you get a warning from the SQL that i is already present in the output dataset.

24         proc sql;
25         create table a_full_join_b as
26         select * from a full join b
27         on a.i=b.i;
WARNING: Variable i existiert bereits in Datei WORK.A_FULL_JOIN_B.
NOTE: Table WORK.A_FULL_JOIN_B created, with 3 rows and 3 columns.

28         run;
NOTE: PROC SQL statements are executed immediately; The RUN statement has no effect.

(Don't mind the partly German messages. The localization of SAS can be funny at times.)

This is a consequence of your indiscriminate use of the asterisk in the select, and it also points to your problem:

There can only be one "i" variable in the select, so only the first one encountered is taken; therefore the order in which the datasets are joined becomes important.

 

Follow Maxim 25 (Have a clean log) and get rid of the warning, and use the proper SQL function to eliminate your problem:

data a;
do i = 1 to 2;
  x = "a";
  output;
end;
run;

data b;
do i = 2 to 3;
  y = "b";
  output;
end;
run;

proc sql;
create table a_full_join_b as
select
  coalesce(a.i,b.i) as i,
  x,
  y
from a full join b
on a.i = b.i
;
quit;

proc print data=a_full_join_b;
run;

proc sql;
create table b_full_join_a as
select
  coalesce(a.i,b.i) as i,
  x,
  y
from b full join a
on a.i = b.i
;
quit;

proc print data=b_full_join_a;
run;

Result:

Beob.    i    x    y

  1      1    a     
  2      2    a    b
  3      3         b
                    

Beob.    i    x    y

  1      1    a     
  2      2    a    b
  3      3         b

As you can see, the outputs are a) complete and b) identical. And there are no extraneous NOTEs and no WARNINGs in the log.

 

Note that I also used the correct quit; statement to end the proc SQLs.

sas-innovate-2024.png

Don't miss out on SAS Innovate - Register now for the FREE Livestream!

Can't make it to Vegas? No problem! Watch our general sessions LIVE or on-demand starting April 17th. Hear from SAS execs, best-selling author Adam Grant, Hot Ones host Sean Evans, top tech journalist Kara Swisher, AI expert Cassie Kozyrkov, and the mind-blowing dance crew iLuminate! Plus, get access to over 20 breakout sessions.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 5 replies
  • 2325 views
  • 5 likes
  • 5 in conversation