data long; merge long pd_score; by segment date; run;
how do I know what kind of join is this? assume join?
it is full join?
does it depends on columns?
let say long has 10 columns (including segment, date and pd)
pd_score only 3 where two are used for joining (segment, date, pd)
now segment and date are used for joining
what can we say what kind of join this is?
full join? left join?
In other words than @Kurt_Bremser - it is similar to full join.
The biggest difference between data step and SQL is that the data step goes row-by-row for all data sets in the merge.
This means it doesn't create a Cartesian product in case of a M-M relationship between the data sets. This will make the data step merge return fewer rows compared to a SQL join.
First of all, it's not a JOIN, it's a MERGE.
The differences between a data step MERGE and a SQL JOIN are significant.
Maxim 3: Know Your Data.
What is the relationship with regard of the BY variables? 1:n, n:1, m:n, or n:n?
This will determine the number of observations in the result.
Are there any variables in both datasets besides the BY variables?
If yes, the sequence of reading the observations ("left" or "right") into the PDV will determine which values make it into the result.
Also apply Maxim 4.
@HeatherNewton wrote:
data long; merge long pd_score; by segment date; run;how do I know what kind of join is this? assume join?
Are values of the segment date combinations duplicated in either data set? Then almost certainly not a "join" of any type as the MERGE behave quite differently when By variables are duplicated in both sets.
Are there variables in the data sets that have common names other than the BY variables? Then almost certainly not a join as only one value is kept and would be from the right-most data set on the Merge statement which is quite different than a Join where you either must specify which version to keep OR if using the * with both source tables would likely keep the first defined alias values.
The behavior of MERGE when BY variables do not match is quite different than a right, left or just about any "Join on" condition.
Merge may also behave a bit differently than SQL when variables are of different lengths with common names.
So IF you are trying to translate a Merge to an SQL join then example data and expected result should be provided. And the example data better be pretty descriptive of your actual data because a small example that doesn't include duplicates but your real problem does will likely not behave as expected.
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.