BookmarkSubscribeRSS Feed
Pooja98
Fluorite | Level 6

My question is that: I want to merge 3 tables in proc sql

 

1) The first code is:

 

proc sql noprint;
Create table want_1 as
select a.*, b.x
from Have_1 as a
left join
Have_2 as b on
a.column- var = b.column- var;
quit;


proc sql noprint;
Create table want_2 as
select a.*, b.y
from want_1 as a
left join
Have_3 as b on
a.column - var = b.column - var;
quit;

 

I'm merging first two datasets, then with this output I merged with the third one

 

2) The second code is: (where I merged three dataset in a single query)

 

proc sql;

create table Combo as

select a.*, b.x, c.y

from Have_1 as a, Have_2 as b, Have_3 as c

where a.column-var= b.column-var and b.column-var = c.column-var;

quit;

 

when i run 2 codes i got the outputs, but the only difference is i got variation in the observation count.

 

The first code shows more observation compared to the second one.

can anyone check anything wrong in my both the codes??

 

why the variation has occurred?

which one is correct?

 

 

TIA

 

2 REPLIES 2
Kurt_Bremser
Super User

Please post the complete logs of both codes, using this button:

Bildschirmfoto 2020-04-07 um 08.32.59.jpg

 

In which dataset is variable var contained?

 

It will help GREATLY if you post examples for the three datasets in usable form (data steps with datalines) to illustrate the issue. Use the "little running man" button (right next to the one indicated above) to post the codes.

ballardw
Super User

When you do a Left Join or Right Join on values ALL of the records from the Left or Right data set are included and the additional variables if requested from the second set appear and where the ON is not matched you get missing values for the added variables.. The order of the data sets matters.

Example:

data one;
  input x y;
datalines;
1 2
2 4
3 6
;
data two;
  input x z;
datalines;
1 44
2 55
;

proc sql;
   create table example as
   select a.x, a.y, b.z
   from one as a
        left join
        two as b
        on a.x = b.x
   ;
quit;

run the code and you will see that there are 3 records in the Example data set because set One has 3 and is the base for the Left Join. The value of z is missing for one record because there was not a matching x in set Two.

Basic Left/Right join behavior: ALL the records, unless an additional filter such as WHERE is applied, from the base set appear in the result.

 

 

 

When you do a cartesian join, which this is:

Have_1 as a, Have_2 as b, Have_3 as c

Every record in A is crossed with every record in B, then the result is crossed with every record in C.

 

Your "where" is not the same as ON because it is applied after all of the above combinations are made and reduces to only the "all match" on the common varible.

sas-innovate-white.png

Register Today!

Join us for SAS Innovate 2025, our biggest and most exciting global event of the year, in Orlando, FL, from May 6-9.

 

Early bird rate extended! Save $200 when you sign up by March 31.

Register now!

Mastering the WHERE Clause in PROC SQL

SAS' Charu Shankar shares her PROC SQL expertise by showing you how to master the WHERE clause using real winter weather data.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 2 replies
  • 706 views
  • 1 like
  • 3 in conversation