BookmarkSubscribeRSS Feed
qinghe
Calcite | Level 5

Hi,

 

I find that in proc sql the "full join" of 2 sets depends on the order. I did the following experiement:

 

data a;

do i=1 to 2;

x="a";

output;

end;

data b;

do i=2 to 3;

y="b";

output;

end;

proc sql;

create table a_full_join_b as

select * from a full join b

on a.i=b.i;

run;

proc sql;

create table b_full_join_a as

select * from b full join a

on a.i=b.i;

run;

 

The results are as blow:

 

a_full_join_b:

 

11a 
22ab
3. b

 

b_full_join_a:

 

1. a
22ba
33b 

 

What I want is:

11a 
22ab
33 b

 

How I get it with proc sql?

 

I know that in this particular case we can use merge. but my real problem is many to many. merge does not work well.

 

Thanks a lot.

 

Qinghe

 

 

 

 

 

5 REPLIES 5
Reeza
Super User

Would update or modify work as desired?

You say you have many to many so without that example it's impossible to provide suggestions. Based on your current post this answers your question, but I suspect it wouldn't work on your full data set. Please ensure your test data will cover your actual data.

 


@qinghe wrote:

Hi,

 

I find that in proc sql the "full join" of 2 sets depends on the order. I did the following experiement:

 

data a;

do i=1 to 2;

x="a";

output;

end;

data b;

do i=2 to 3;

y="b";

output;

end;

proc sql;

create table a_full_join_b as

select * from a full join b

on a.i=b.i;

run;

proc sql;

create table b_full_join_a as

select * from b full join a

on a.i=b.i;

run;

 

The results are as blow:

 

a_full_join_b:

 

1 1 a  
2 2 a b
3 .   b

 

b_full_join_a:

 

1 .   a
2 2 b a
3 3 b  

 

What I want is:

1 1 a  
2 2 a b
3 3   b

 

How I get it with proc sql?

 

I know that in this particular case we can use merge. but my real problem is many to many. merge does not work well.

 

Thanks a lot.

 

Qinghe

 

 

 

 

 


 

data have;
update a b;
by i;
run;

Results:

Obs i x y
1 1 a
2 2 a b
3 3    b

Astounding
PROC Star

You will need to be more explicit about what to SELECT.  I can't test it now, but I think this should do it:

 

create table combined as select coalesce(a.id, b.id) as id, x, y from   /* either way should work */

Reeza
Super User

@Astounding solution works. Note that you end a Data Step with a RUN; and a PROC SQL with a QUIT;

 

proc sql;
create table a_full_join_b as
select coalesce(a.i, b.i) as I, x, y 
from b
full join a on a.i=b.i; quit;
Tom
Super User Tom
Super User

Perhaps you can take advantage of the NATURAL joins?  As long as all of the common variables form the keys for the join it should do what you want.

 

data a;
 input i x $;
cards;
1 a
2 a
;
data b ;
 input i y $;
cards;
2 b
3 b
;
proc sql ;
  create table want as
  select *
  from b natural full join a
  ;
quit;
Obs    i    x    y

 1     1    a
 2     2    a    b
 3     3         b
Kurt_Bremser
Super User

When you run your code, you get a warning from the SQL that i is already present in the output dataset.

24         proc sql;
25         create table a_full_join_b as
26         select * from a full join b
27         on a.i=b.i;
WARNING: Variable i existiert bereits in Datei WORK.A_FULL_JOIN_B.
NOTE: Table WORK.A_FULL_JOIN_B created, with 3 rows and 3 columns.

28         run;
NOTE: PROC SQL statements are executed immediately; The RUN statement has no effect.

(Don't mind the partly German messages. The localization of SAS can be funny at times.)

This is a consequence of your indiscriminate use of the asterisk in the select, and it also points to your problem:

There can only be one "i" variable in the select, so only the first one encountered is taken; therefore the order in which the datasets are joined becomes important.

 

Follow Maxim 25 (Have a clean log) and get rid of the warning, and use the proper SQL function to eliminate your problem:

data a;
do i = 1 to 2;
  x = "a";
  output;
end;
run;

data b;
do i = 2 to 3;
  y = "b";
  output;
end;
run;

proc sql;
create table a_full_join_b as
select
  coalesce(a.i,b.i) as i,
  x,
  y
from a full join b
on a.i = b.i
;
quit;

proc print data=a_full_join_b;
run;

proc sql;
create table b_full_join_a as
select
  coalesce(a.i,b.i) as i,
  x,
  y
from b full join a
on a.i = b.i
;
quit;

proc print data=b_full_join_a;
run;

Result:

Beob.    i    x    y

  1      1    a     
  2      2    a    b
  3      3         b
                    

Beob.    i    x    y

  1      1    a     
  2      2    a    b
  3      3         b

As you can see, the outputs are a) complete and b) identical. And there are no extraneous NOTEs and no WARNINGs in the log.

 

Note that I also used the correct quit; statement to end the proc SQLs.

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 5 replies
  • 2304 views
  • 5 likes
  • 5 in conversation