Re: proc merge

HeatherNewton · Posted 02-23-2022 08:23 PM

what is SAS merge in sql language, is it full outer join?

Tom · Posted 02-23-2022 08:47 PM

It isn't.

Closest is a FULL JOIN but you can use the flags created by the IN= dataset option to do LEFT, RIGHT or INNER also.

But the behavior in Many to Many situations is different.

In an SQL join every observation in the left table is matched to every observation in the right table. So a group of 3 joined to a group of 2 results in 6 observations.

In a data step MERGE the observations are matched in the order they are retrieved. So a group of 3 merged with a group of 2 results in 3 observations. The values of unique fields from the smaller group's last observation are retained (really just not replaced) for the rest of the observations in the group.

HeatherNewton · Posted 02-23-2022 09:02 PM

can you provide a simple example?

Tom · Posted 02-23-2022 09:11 PM

You can make your own. But here is a simple example.

data left;
 input id left $;
cards;
1 A
1 B
1 C
2 D
3 E
;

data right ;
  input id right $;
cards;
1 Z
1 Y
3 X
3 W
4 V
;

data merged ;
  merge left right ;
  by id;
run;

proc sql;
create table joined as
select * 
from left
full join right
 on left.id=right.id
;
quit;

proc print data=merged;
  title 'merged';
run;
proc print data=joined;
  title 'joined';
run;

And I forgot that SQL joins are too stupid to realize that your key variables should be same. When you try to name two variables with the same name in an SQL SELECT list of variables only the first variable with that name will make it into the dataset. So the last observation in the JOINED dataset is missing the ID value since it did not exist in the LEFT dataset.

You could get around this by using NATURAL join, but then you lose direct control of the criteria of the join, which one of the nice features of an SQL join.

Kurt_Bremser · Posted 02-24-2022 04:06 AM

That's why real programmers (those who can type in bootstrap code in hex from the console of a mainframe, joke intended) never use the asterisk in joins, but an exhaustive list of variables.

😉

In your code, this would be

proc sql;
create table joined as
select
  coalesce(t1.id,t2.id) as id,
  t1.left,
  t2.right 
from left t1
full join right t2
 on t1.id=t2.id
;
quit;

which would also solve the missing issue for observations contained only in the second dataset.

Maxims of Maximally Efficient SAS Programmers
How to convert datasets to data steps
The macro for direct download as ZIP
How to post code
Please vote for Provide Sequential Search Capability for Hash Objects
How to deal with locked files on UNIX

HeatherNewton · Posted 10-22-2022 09:56 AM

do you mean if it is not many to many, can be interpreted by left, right or inner join

if it is many to many then it cannot be replicated by any sql join and specific programming is required?

Kurt_Bremser · Posted 10-22-2022 10:28 AM

@HeatherNewton wrote:

if it is many to many then it cannot be replicated by any sql join and specific programming is required?

Yes. Very often a many to many MERGE in a data step is tolerated because the relevant variables in one dataset sr'tay constant for a group. In this case, you should deduplicate first in SQL before doing the main join.

Maxims of Maximally Efficient SAS Programmers
How to convert datasets to data steps
The macro for direct download as ZIP
How to post code
Please vote for Provide Sequential Search Capability for Hash Objects
How to deal with locked files on UNIX

HeatherNewton · Posted 10-22-2022 11:06 AM

data samp_acct_card;

merge samp_acct_card(in=a) relation (keep=relationship_no in =b);

by relationship_no;

if a;

if b and samp_ex=" and IND_CENTRAL_LIMIT in ('Y','1') then samp_ex="DEFAULT_CENTRAL_LMT';

run;

when I see the line 'if a', I'd like to think this is a left join, but with the next line 'if b....'

I dont know what kind of join this is, please assist. thanks.

Kurt_Bremser · Posted 10-22-2022 12:02 PM

STOP thinking of data step MERGEs in SQL terms. As long as you keep this illusion, you'll never get anywhere.

A data step processes observations in one or more datasets sequentially. SQL, OTOH, works with sets of records.

The first IF is a Subsetting IF (PLEASE study the documentation!), while the second is a "normal" IF which executes a statement depending on a condition.

Maxims of Maximally Efficient SAS Programmers
How to convert datasets to data steps
The macro for direct download as ZIP
How to post code
Please vote for Provide Sequential Search Capability for Hash Objects
How to deal with locked files on UNIX

Reeza · Posted 02-23-2022 09:16 PM

Various forms of merges are covered here in the documentation.

https://documentation.sas.com/doc/en/pgmsascdc/9.4_3.5/lrcon/n1tgk0uanvisvon1r26lc036k0w7.htm

Note there is no 'proc merge', @Tom is assuming you're referring to a data step merge but you've also mentioned a SQL merge so it's unclear to me.

@HeatherNewton wrote:

what is SAS merge in sql language, is it full outer join?

ballardw · Posted 02-23-2022 09:30 PM

Merge also treats same named variables from both sets quite a bit differently than SQL joins do. As in, you only get one variable by a given name in the data vector so the values from one set will replace the other.

Kurt_Bremser · Posted 02-24-2022 04:10 AM

It's neither, because a MERGE in a DATA step behaves completely different in many-to-many joins.

And how the MERGE behaves in one-to-many, many-to-one or one-to-one joins is controlled by subsetting IF's using the IN= variables, so even there there is no true answer.

Maxims of Maximally Efficient SAS Programmers
How to convert datasets to data steps
The macro for direct download as ZIP
How to post code
Please vote for Provide Sequential Search Capability for Hash Objects
How to deal with locked files on UNIX