Equivalent in Sql

SASPhile · Posted 08-31-2011 08:03 AM

How can the following be done in sql?

data merged_data;

merge address_A (in = A) address_B (in = B);
by streetnum streetname zip;
if A and not B then match = "A only";
if B and not A then match = "B only";
if A and B then match = "match";
run;

art297 · Posted 08-31-2011 08:29 AM

There are a number of papers that provide answers to your three questions. E.g., take a look at:

http://www.ats.ucla.edu/stat/sas/library/nesug99/bt150.pdf

SASPhile · Posted 08-31-2011 10:23 AM

Thanks for the reply.But it does not talk about the logic that is used in the if statements.

DLing · Posted 08-31-2011 10:40 AM

page 5 talks about "left join" (in A), "right join" (in B), and "full join" (in A or B). It doesn't explicitly create the "match" variables though.

art297 · Posted 08-31-2011 12:34 PM

Using the same example datasets used in the paper, does the following provide a sufficient example? I am not a sql whiz, thus there may be better ways of doing the 3 types of merges:

DATA STATES;

LENGTH STATE $ 2 ZIP $ 5;

INPUT ZIP $ STATE $;

CARDS;

08000 NJ

10000 NY

19000 PA

;

RUN;

DATA CITYS;

LENGTH ZIP $ 5 CITY $ 15;

INPUT ZIP $ CITY $;

CARDS;

10000 NEWYORK

19000 PHILADELPHIA

90000 LOSANGELES

;

RUN;

PROC SQL;

SELECT A.ZIP, A.STATE, B.CITY

FROM STATES A, CITYS B

where A.ZIP=B.ZIP

;

quit;

PROC SQL;

SELECT A.ZIP, A.STATE, B.CITY

FROM STATES A left join CITYS B

on A.ZIP=B.ZIP

having missing(b.zip)

;

quit;

PROC SQL;

SELECT A.ZIP, A.STATE, B.CITY

FROM STATES A right join CITYS B

on A.ZIP=B.ZIP

having missing(a.zip)

;

quit;

MikeZdeb · Posted 08-31-2011 12:35 PM

Hi ... first, in your data step, I make some changes ...

data merged_data;

merge address_A (in = A) address_B (in = B);

by streetnum streetname zip;

if A and B then match = "match ";

else

if A then match = "A only";

else match = "B only";

run;

As for your question, you can add similar logic to SQL with a CASE statement. Here's

something like your MERGE that uses a SELECT statement since that looks a bit like

a CASE statement in SQL. Then there's an SQL equivalent.

data one;

input id age @@;

datalines;

1 10 2 20 3 30

;

run;

data two;

input id zip @@;

datalines;

1 12202 3 23456 4 99901

;

run;

data merg;

merge one (in=a) two(in=b);

by id;

select;

when (a and b) match = "match";

when (a) match = "aonly";

otherwise match = "bonly";

end;

run;

proc sql;

create table sqel as

select coalesce (one.id, two.id) as id, age, zip,

case when (one.id and two.id) then "match"

when (one.id) then "aonly"

else "bonly"

end as match

from one full join two

on one.id eq two.id;

quit;

title "MERGE";

proc print data=merg noobs;

run;

title "SQL";

proc print data=sqel noobs;

run;

MERGE

id age zip match

1 10 12202 match

2 20 . aonly

3 30 23456 match

4 . 99901 bonly

SQL

id age zip match

1 10 12202 match

2 20 . aonly

3 30 23456 match

4 . 99901 bonly

ps from Howard Schreier's great book ... PROC SQL by Example: Using SQL within SAS

https://support.sas.com/pubscat/bookdetails.jsp?pc=60500

http://howles.com/sqlbook/

Howles · Posted 08-31-2011 09:25 PM

A couple of other techniques:

1. NATURAL joins

2. inline views to add flag columns to each side of the join

PS: Thanks for the plug!

MikeZdeb wrote:
Hi ... first, in your data step, I make some changes ...
data merged_data;
merge address_A (in = A) address_B (in = B);
by streetnum streetname zip;
if A and B then match = "match ";
else
if A then match = "A only";
else      match = "B only";
run;
As for your question, you can add similar logic to SQL with a CASE statement.  Here's
something like your MERGE that uses a SELECT statement since that looks a bit like
a CASE statement in SQL.  Then there's an SQL equivalent.
data one;
input id age @@;
datalines;
1 10 2 20 3 30
;
run;
data two;
input id zip @@;
datalines;
1 12202 3 23456 4 99901
;
run;
 data merg;
merge one (in=a) two(in=b);
by id;
select;
   when (a and b) match = "match";
   when (a)       match = "aonly";
   otherwise      match = "bonly";
end;
run;
proc sql;
create table sqel as
select coalesce (one.id, two.id) as id, age, zip,
       case when (one.id and two.id) then "match"
            when (one.id) then "aonly"
            else "bonly"
       end as match
from one full join two
on one.id eq two.id;
quit;
title "MERGE";
proc print data=merg noobs;
run;
title "SQL";
proc print data=sqel noobs;
run;
MERGE
id    age     zip     match
 1     10    12202    match
 2     20        .    aonly
 3     30    23456    match
 4      .    99901    bonly
SQL
id    age     zip     match
 1     10    12202    match
 2     20        .    aonly
 3     30    23456    match
 4      .    99901    bonly
 ps  from Howard Schreier's great book ... PROC SQL by Example: Using SQL within SAS
https://support.sas.com/pubscat/bookdetails.jsp?pc=60500
http://howles.com/sqlbook/

Hobbes · Posted 08-31-2011 12:55 PM

You should use a full join and use and abuse the "coalesce" function, specially, for any common variable. A solution for your problem would be something like:

proc sql;

SELECT Coalesce(a.streetnum, b.streetnum) AS streetnum,

Coalesce(a.streetname, b.streetname) AS streetname,

Coalesce(a.zip, b.zip) AS zip,

CASE

WHEN NOT( a.streetnum IS NULL

AND a.streetname IS NULL

AND a.zip IS NULL )

AND NOT( b.streetnum IS NULL

AND b.streetname IS NULL

AND b.zip IS NULL ) THEN "match"

WHEN NOT( a.streetnum IS NULL

AND a.streetname IS NULL

AND a.zip IS NULL )

AND( b.streetnum IS NULL

AND b.streetname IS NULL

AND b.zip IS NULL ) THEN "A only"

WHEN( a.streetnum IS NULL

AND a.streetname IS NULL

AND a.zip IS NULL )

AND NOT( b.streetnum IS NULL

AND b.streetname IS NULL

AND b.zip IS NULL ) THEN "B only"

ELSE "match"

END AS match

FROM address_a AS a

FULL JOIN address_b AS b

ON ( a.streetnum = b.streetnum

AND a.streetname = b.streetname

AND a.zip = b.zip );

quit;

Tom · Posted 08-31-2011 03:33 PM

There are two main differences in the MERGE and the SQL FULL JOIN.

One is the way to check which dataset has contributed data. The CASE example above is most similar to the IF/THEN logic in your DATA step.

The other is what it does when there are variables with the same name in both datasets. For this you will need to use the COALESCE function. Remember to use for both your BY variables and any other other variables that occur in both datasets.

Peter_C · Posted 08-31-2011 03:36 PM

short answer is YES, but since you have a clearly understandable data step, WHY?

art297 · Posted 08-31-2011 03:42 PM

Peter,

It never hurts to know and, moreso, if one has to meet any of the three conditions in a many-to-many merge situation, it will be a lot easier to understand than the gyrations one would have to use to get the right result out of a datastep merge. No?

Peter_C · Posted 08-31-2011 04:13 PM

Art

you are right of course. . . I was hoping the op might explain as Hobbs has, but fearing it was just some more homework!

I am surprised at the quantity of sql needed and that something like this is not sufficient

select

case

when keyA=keyB then 'match'

when keyB is null then 'A file'

else 'B file'

end length=6 as match

from afile as a

full join bfile as b

;

Hobbes · Posted 08-31-2011 04:24 PM

And this is just half of the work because for each common variable you would need to write a new statement, something like,

Coalesce(b.common_var, a.common_var) AS common_var

Hobbes · Posted 08-31-2011 03:52 PM

As an example,

I had to use this solution many times with SAS Data Integration Studio 3.4 because there wasn't a built in "Merge Tranformation" therefore the most common way of merging datasets was using the SQL Join transformation.