BookmarkSubscribeRSS Feed
☑ This topic is solved. Need further help from the community? Please sign in and ask a new question.
cosmid
Lapis Lazuli | Level 10

Hi,

 

The book on SAS said that when only some of the values of the BY variable match, a PROC SQL full join will produce the same result as a DATA step match-merge. Here is the example from the book:

 

data THREE;
  input X A $;
datalines;
X   A
1   a1
1   a2
2   b1
2   b2
4   d

data FOUR:
  input X B $;
datalines;
X   B
2   x1
2   x2
3   y
5   v

data merged;
  merge three four;
  by x;
run;

RESULT from Data step Merge:
X   A     B
1   a1    
1   a2
2   b1   x1
2   b2   x2
3        y
4   d
5        v

proc sql;
  select coalesce(three.x, four.x) as X, a, b
  from three full join four
  on three.x = four.x;
quit;

RESULT from proc sql join:
X   A   B
1   a2  
1   a1
2   b1   x1
2   b1   x2
2   b2   x1
2   b2   x2
3        y
4   d
5        v

Clearly, the proc sql has 2 more rows than the data step merge. Is there something wrong with the code in the proc sql or the book itself was misleading by saying that the proc sql would produce the same result? 

Thanks

1 ACCEPTED SOLUTION

Accepted Solutions
Tom
Super User Tom
Super User

What book? What exactly did it say?

 

If two or more contributing tables have two or more observations for the same BY group values then PROC SQL FULL JOIN will produce more observations. The SQL join will produce N*M observations for such a by group.  The data step merge will produce MAX(N,M) observations. 

 

They are only similar when only one of the contributing datasets has more than one observation in a BY group

View solution in original post

3 REPLIES 3
Tom
Super User Tom
Super User

What book? What exactly did it say?

 

If two or more contributing tables have two or more observations for the same BY group values then PROC SQL FULL JOIN will produce more observations. The SQL join will produce N*M observations for such a by group.  The data step merge will produce MAX(N,M) observations. 

 

They are only similar when only one of the contributing datasets has more than one observation in a BY group

mkeintz
PROC Star

"The book on SAS"?

 

Which of the many (over a hundred?) books on SAS that cover both match-merge and PROC SQL are you referring to?

 

Citation please, with the actual contents.

 

PS:  There are certain instances in which this claim is true:  Namely if, no BY value requires a many-to-many join.

Editted note: although order may very well be different.

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------
cosmid
Lapis Lazuli | Level 10
The book is SAS Certified Professional Prep Guide.

I went back and read the section again and realized I made a mistake. It said:
When the COALESCE function is added to the preceding PROC SQL full outer join, the DATA step match-merge (with PROC PRINT step) and the PROC SQL full outer join combine rows in the same way.

So, I mistakenly thought the output will be the same and now I realized they only COMBINE them in the same way.

SAS Innovate 2025: Register Now

Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!

Mastering the WHERE Clause in PROC SQL

SAS' Charu Shankar shares her PROC SQL expertise by showing you how to master the WHERE clause using real winter weather data.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 3 replies
  • 898 views
  • 2 likes
  • 3 in conversation