WHAT KIND OF JOIN

HeatherNewton · Posted 06-12-2023 02:16 AM

data long;
merge long pd_score;
by segment date;
run;

how do I know what kind of join is this? assume join?

HeatherNewton · Posted 06-12-2023 02:22 AM

it is full join?

HeatherNewton · Posted 06-12-2023 02:32 AM

does it depends on columns?

let say long has 10 columns (including segment, date and pd)

pd_score only 3 where two are used for joining (segment, date, pd)

now segment and date are used for joining

what can we say what kind of join this is?

full join? left join?

LinusH · Posted 06-12-2023 07:13 AM

In other words than @Kurt_Bremser - it is similar to full join.

The biggest difference between data step and SQL is that the data step goes row-by-row for all data sets in the merge.

This means it doesn't create a Cartesian product in case of a M-M relationship between the data sets. This will make the data step merge return fewer rows compared to a SQL join.

Data never sleeps

Kurt_Bremser · Posted 06-12-2023 03:17 AM

First of all, it's not a JOIN, it's a MERGE.

The differences between a data step MERGE and a SQL JOIN are significant.

Maxim 3: Know Your Data.

What is the relationship with regard of the BY variables? 1:n, n:1, m:n, or n:n?

This will determine the number of observations in the result.

Are there any variables in both datasets besides the BY variables?

If yes, the sequence of reading the observations ("left" or "right") into the PDV will determine which values make it into the result.

Also apply Maxim 4.

Maxims of Maximally Efficient SAS Programmers
How to convert datasets to data steps
The macro for direct download as ZIP
How to post code
Please vote for Provide Sequential Search Capability for Hash Objects
How to deal with locked files on UNIX

ballardw · Posted 06-12-2023 01:40 PM

@HeatherNewton wrote:
data long;
merge long pd_score;
by segment date;
run;
how do I know what kind of join is this? assume join?

Are values of the segment date combinations duplicated in either data set? Then almost certainly not a "join" of any type as the MERGE behave quite differently when By variables are duplicated in both sets.

Are there variables in the data sets that have common names other than the BY variables? Then almost certainly not a join as only one value is kept and would be from the right-most data set on the Merge statement which is quite different than a Join where you either must specify which version to keep OR if using the * with both source tables would likely keep the first defined alias values.

The behavior of MERGE when BY variables do not match is quite different than a right, left or just about any "Join on" condition.

Merge may also behave a bit differently than SQL when variables are of different lengths with common names.

So IF you are trying to translate a Merge to an SQL join then example data and expected result should be provided. And the example data better be pretty descriptive of your actual data because a small example that doesn't include duplicates but your real problem does will likely not behave as expected.

Reeza · Posted 06-12-2023 01:55 PM

I'll direct you back to your previous thread. The answers has not changed.

https://communities.sas.com/t5/SAS-Programming/SAS-merge-SQL-equivalent/td-p/798286

WHAT KIND OF JOIN

Re: WHAT KIND OF JOIN

Re: WHAT KIND OF JOIN

Re: WHAT KIND OF JOIN

Re: WHAT KIND OF JOIN

Re: WHAT KIND OF JOIN

Re: WHAT KIND OF JOIN

SAS Innovate 2025: Call for Content

Classroom Training Available!