SAS Data Integration Studio, DataFlux Data Management Studio, SAS/ACCESS, SAS Data Loader for Hadoop and others

Merge vs SQL Left Join

Reply
Regular Contributor
Posts: 183

Merge vs SQL Left Join

I have two data sets.

ID Name Height

1 A 1

3 B 2

5 C 2

7 D 2

9 E 2

ID Name Height

2 A 2

4 B 3

5 C 4

7 D 5

I am merging them using MERGE and SQL Left Join. MERGE returns correct matching value for same values in the common variable (Primary Key) , while SQL left join does not update values.

Data A;

Input ID Name$ Height;

cards;

1 A 1

3 B 2

5 C 2

7 D 2

9 E 2

;

run;

Data B;

Input ID Name$ Height;

cards;

2 A 2

4 B 3

5 C 4

7 D 5

;

run;

PROC SQL;

Create table dummy as

Select * from A as file1 left join B as file2

On file1.ID = file2.ID;

Quit;

PROC SQL returns :

ID Name Height

1 A 1

3 B 2

5 C 2

7 D 2

9 E 2

Data dummy;

merge A (IN=A) B (IN=B);

by ID;

If A;

run;

MERGE returns :

ID Name Height

1 A 1

3 B 2

5 C 4

7 D 5

9 E 2

Why PROC SQL LEFT JOIN does not update common values?

Respected Advisor
Posts: 4,173

Re: Merge vs SQL Left Join

In your SQL you're joining 2 tables with common variable names (other than the one you join on). If you execute such code you will see below warnings in your code:

WARNING: Variable ID already exists on file WORK.DUMMY.

WARNING: Variable Name already exists on file WORK.DUMMY.

WARNING: Variable Height already exists on file WORK.DUMMY.

With the SAS SQL flavor a "select *" will take the variable from the first table and ignore the one from the second table (that's what the warning tells you) - other SQL flavors (eg. Oracle) would throw an error.

With a SAS Merge the last listed same named variable will overwrite the same named variable from a table listed earlier in the merge statement. That's why in this case you're getting the values from table 'B'.

Trusted Advisor
Posts: 1,228

Re: Merge vs SQL Left Join

There is difference in data step and sql processing. Data merging works sequentially checks each observation of each data set to match these by BY value to write it to combined dataset and same name variables in the first data set are overwritten with the liked named variable in subsequent data sets . SQL processing is based on cartesian product where processing moves horizontally and same name variables coming from different data sets can not be overwritten.

Super User
Posts: 10,020

Re: Merge vs SQL Left Join

If you like SQL, try this one :

Data A;
Input ID Name$ Height;
cards;
1 A 1
3 B 2
5 C 2
7 D 2
9 E 2
;
run;
 
Data B;
Input ID Name$ Height;
cards;
2 A 2
4 B 3
5 C 4
7 D 5
;
run;

PROC SQL;
Create table dummy as
Select file1.ID,file1.Name,coalesce(file2.Height,file1.Height) as Height from A as file1 left join B as file2
On file1.ID = file2.ID and file1.name = file2.name;
Quit;

Xia Keshan

Frequent Contributor
Posts: 92

Re: Merge vs SQL Left Join

Using PROC SQL, how do I overwrite the height value without creating a new variable and then dropping the existing and renaming the new to existing?

PROC SQL;

Create table dummy as

Select *

     , case when NAME = 'A' then 5 else 6 end as height

from A as file1 left join B as file2 On file1.ID = file2.ID;

Quit;

Super User
Super User
Posts: 7,942

Re: Merge vs SQL Left Join

Hi,

You should really create a new topic rather than replying to an old post.  For your question, the reason you are getting a new variable at the end is because you are using the * notation.  This selects all variables.  If you write SQL correctly, then you would specify each variable in the select, where you want it to appear, its attributes, and what data to collect, e.g.

proc sql;

     create table WANT as

     select     A.FIRST_VARIABLE as FIRST_VARIABLE,

                     B.NEW_VARIABLE as SECOND_VARIABLE,

                     case  when A.TEST_VAR=1 then 'Found'

                                else   'Not Found' end as THIRD_VARIABLE length=20,

                     A.LAST_VAR as FOURTH_VARIABLE

     from        HAVE A

     left join   HAVE2 B

     on           A.VAR=B.VAR;

quit;

Super User
Posts: 10,020

Re: Merge vs SQL Left Join

Yeah. Start a new session. and Post some sample data and the output you want , that would be better to explain your question.

Community Manager
Posts: 564

Re: Merge vs SQL Left Join

Hi 

 

 

Ask a Question
Discussion stats
  • 7 replies
  • 4720 views
  • 2 likes
  • 7 in conversation