Solved: DATA Step MERGE vs PROC SQL JOIN

cosmid · Posted 12-10-2023 05:45 PM

Hi,

The book on SAS said that when only some of the values of the BY variable match, a PROC SQL full join will produce the same result as a DATA step match-merge. Here is the example from the book:

data THREE;
  input X A $;
datalines;
X   A
1   a1
1   a2
2   b1
2   b2
4   d

data FOUR:
  input X B $;
datalines;
X   B
2   x1
2   x2
3   y
5   v

data merged;
  merge three four;
  by x;
run;

RESULT from Data step Merge:
X   A     B
1   a1    
1   a2
2   b1   x1
2   b2   x2
3        y
4   d
5        v

proc sql;
  select coalesce(three.x, four.x) as X, a, b
  from three full join four
  on three.x = four.x;
quit;

RESULT from proc sql join:
X   A   B
1   a2  
1   a1
2   b1   x1
2   b1   x2
2   b2   x1
2   b2   x2
3        y
4   d
5        v

Clearly, the proc sql has 2 more rows than the data step merge. Is there something wrong with the code in the proc sql or the book itself was misleading by saying that the proc sql would produce the same result?

Thanks

Tom · Posted 12-10-2023 07:36 PM

What book? What exactly did it say?

If two or more contributing tables have two or more observations for the same BY group values then PROC SQL FULL JOIN will produce more observations. The SQL join will produce N*M observations for such a by group. The data step merge will produce MAX(N,M) observations.

They are only similar when only one of the contributing datasets has more than one observation in a BY group

View solution in original post

Tom · Posted 12-10-2023 07:36 PM

What book? What exactly did it say?

If two or more contributing tables have two or more observations for the same BY group values then PROC SQL FULL JOIN will produce more observations. The SQL join will produce N*M observations for such a by group. The data step merge will produce MAX(N,M) observations.

They are only similar when only one of the contributing datasets has more than one observation in a BY group

mkeintz · Posted 12-10-2023 07:37 PM

"The book on SAS"?

Which of the many (over a hundred?) books on SAS that cover both match-merge and PROC SQL are you referring to?

Citation please, with the actual contents.

PS: There are certain instances in which this claim is true: Namely if, no BY value requires a many-to-many join.

Editted note: although order may very well be different.

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------

cosmid · Posted 12-17-2023 04:20 PM

The book is SAS Certified Professional Prep Guide.

I went back and read the section again and realized I made a mistake. It said:
When the COALESCE function is added to the preceding PROC SQL full outer join, the DATA step match-merge (with PROC PRINT step) and the PROC SQL full outer join combine rows in the same way.

So, I mistakenly thought the output will be the same and now I realized they only COMBINE them in the same way.

DATA Step MERGE vs PROC SQL JOIN

Re: DATA Step MERGE vs PROC SQL JOIN

Re: DATA Step MERGE vs PROC SQL JOIN

Re: DATA Step MERGE vs PROC SQL JOIN

Re: DATA Step MERGE vs PROC SQL JOIN

DATA Step MERGE vs PROC SQL JOIN

Re: DATA Step MERGE vs PROC SQL JOIN

Re: DATA Step MERGE vs PROC SQL JOIN

Re: DATA Step MERGE vs PROC SQL JOIN

Re: DATA Step MERGE vs PROC SQL JOIN

Registration is open