BookmarkSubscribeRSS Feed
HeatherNewton
Quartz | Level 8
data long;
merge long pd_score;
by segment date;
run;

how do I know what kind of join is this? assume join?

6 REPLIES 6
HeatherNewton
Quartz | Level 8

it is full join?

HeatherNewton
Quartz | Level 8

does it depends on columns?

let say long has 10 columns (including segment, date and pd)

pd_score only 3 where two are used for joining (segment, date, pd)

 

now segment and date are used for joining

 

what can we say what kind of join this is?

full join? left join?

LinusH
Tourmaline | Level 20

In other words than @Kurt_Bremser - it is similar to full join.

The biggest difference between data step and SQL is that the data step goes row-by-row for all data sets in the merge.

This means it doesn't create a Cartesian product in case of a M-M relationship between the data sets. This will make the data step  merge return fewer rows compared to a SQL join.

Data never sleeps
Kurt_Bremser
Super User

First of all, it's not a JOIN, it's a MERGE.

The differences between a data step MERGE and a SQL JOIN are significant.

Maxim 3: Know Your Data.

What is the relationship with regard of the BY variables? 1:n, n:1, m:n, or n:n?

This will determine the number of observations in the result.

Are there any variables in both datasets besides the BY variables?

If yes, the sequence of reading the observations ("left" or "right") into the PDV will determine which values make it into the result.

Also apply Maxim 4.

ballardw
Super User

@HeatherNewton wrote:
data long;
merge long pd_score;
by segment date;
run;

how do I know what kind of join is this? assume join?


Are values of the segment date combinations duplicated in either data set? Then almost certainly not a "join" of any type as the MERGE behave quite differently when By variables are duplicated in both sets.

Are there variables in the data sets that have common names other than the BY variables? Then almost certainly not a join as only one value is kept and would be from the right-most data set on the Merge statement which is quite different than a Join where you either must specify  which version to keep OR if using the * with both source tables would likely keep the first defined alias values.

 

The behavior of MERGE when BY variables do not match is quite different than a right, left or just about any "Join on" condition.

 

Merge may also behave a bit differently than SQL when variables are of different lengths with common names.

 

So IF you are trying to translate a Merge to an SQL join then example data and expected result should be provided. And the example data better be pretty descriptive of your actual data because a small example that doesn't include duplicates but your real problem does will likely not behave as expected.

Reeza
Super User
I'll direct you back to your previous thread. The answers has not changed.

https://communities.sas.com/t5/SAS-Programming/SAS-merge-SQL-equivalent/td-p/798286

SAS Innovate 2025: Save the Date

 SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!

Save the date!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 6 replies
  • 878 views
  • 1 like
  • 5 in conversation