what is SAS merge in sql language, is it full outer join?
It isn't.
Closest is a FULL JOIN but you can use the flags created by the IN= dataset option to do LEFT, RIGHT or INNER also.
But the behavior in Many to Many situations is different.
In an SQL join every observation in the left table is matched to every observation in the right table. So a group of 3 joined to a group of 2 results in 6 observations.
In a data step MERGE the observations are matched in the order they are retrieved. So a group of 3 merged with a group of 2 results in 3 observations. The values of unique fields from the smaller group's last observation are retained (really just not replaced) for the rest of the observations in the group.
can you provide a simple example?
You can make your own. But here is a simple example.
data left;
input id left $;
cards;
1 A
1 B
1 C
2 D
3 E
;
data right ;
input id right $;
cards;
1 Z
1 Y
3 X
3 W
4 V
;
data merged ;
merge left right ;
by id;
run;
proc sql;
create table joined as
select *
from left
full join right
on left.id=right.id
;
quit;
proc print data=merged;
title 'merged';
run;
proc print data=joined;
title 'joined';
run;
And I forgot that SQL joins are too stupid to realize that your key variables should be same. When you try to name two variables with the same name in an SQL SELECT list of variables only the first variable with that name will make it into the dataset. So the last observation in the JOINED dataset is missing the ID value since it did not exist in the LEFT dataset.
You could get around this by using NATURAL join, but then you lose direct control of the criteria of the join, which one of the nice features of an SQL join.
That's why real programmers (those who can type in bootstrap code in hex from the console of a mainframe, joke intended) never use the asterisk in joins, but an exhaustive list of variables.
😉
In your code, this would be
proc sql;
create table joined as
select
coalesce(t1.id,t2.id) as id,
t1.left,
t2.right
from left t1
full join right t2
on t1.id=t2.id
;
quit;
which would also solve the missing issue for observations contained only in the second dataset.
do you mean if it is not many to many, can be interpreted by left, right or inner join
if it is many to many then it cannot be replicated by any sql join and specific programming is required?
@HeatherNewton wrote:
if it is many to many then it cannot be replicated by any sql join and specific programming is required?
Yes. Very often a many to many MERGE in a data step is tolerated because the relevant variables in one dataset sr'tay constant for a group. In this case, you should deduplicate first in SQL before doing the main join.
data samp_acct_card; merge samp_acct_card(in=a) relation (keep=relationship_no in =b); by relationship_no; if a; if b and samp_ex=" and IND_CENTRAL_LIMIT in ('Y','1') then samp_ex="DEFAULT_CENTRAL_LMT'; run;
when I see the line 'if a', I'd like to think this is a left join, but with the next line 'if b....'
I dont know what kind of join this is, please assist. thanks.
STOP thinking of data step MERGEs in SQL terms. As long as you keep this illusion, you'll never get anywhere.
A data step processes observations in one or more datasets sequentially. SQL, OTOH, works with sets of records.
The first IF is a Subsetting IF (PLEASE study the documentation!), while the second is a "normal" IF which executes a statement depending on a condition.
Various forms of merges are covered here in the documentation.
https://documentation.sas.com/doc/en/pgmsascdc/9.4_3.5/lrcon/n1tgk0uanvisvon1r26lc036k0w7.htm
Note there is no 'proc merge', @Tom is assuming you're referring to a data step merge but you've also mentioned a SQL merge so it's unclear to me.
@HeatherNewton wrote:
what is SAS merge in sql language, is it full outer join?
Merge also treats same named variables from both sets quite a bit differently than SQL joins do. As in, you only get one variable by a given name in the data vector so the values from one set will replace the other.
It's neither, because a MERGE in a DATA step behaves completely different in many-to-many joins.
And how the MERGE behaves in one-to-many, many-to-one or one-to-one joins is controlled by subsetting IF's using the IN= variables, so even there there is no true answer.
For such questions it's often worth to search if the SAS documentation provides some information.
I found the following via Google search with keywords: sas help center 9.4 sql merge compare
https://go.documentation.sas.com/doc/en/pgmsascdc/9.4_3.5/lepg/p1v0kcf40q6x7mn1pu5hl4s9ux48.htm
...and after searching one more minute also these links:
https://go.documentation.sas.com/doc/en/pgmsascdc/9.4_3.5/lepg/p1phdzzlrc1wi8n1bpigstwt851o.htm
I suggest you read all this information in detail as once fully understood it will help you to solve a myriad of cases.
Neither, it's more like a cursor equivalent as it processes row by row.
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.