Quartz | Level 8

## Equivalent in Sql

How can the following be done in sql?

data merged_data;

by streetnum streetname zip;
if A and not B then match = "A only";
if B and not A then match = "B only";
if A and B then match = "match";
run;

13 REPLIES 13
Opal | Level 21

## Equivalent in Sql

There are a number of papers that provide answers to your three questions.  E.g., take a look at:

http://www.ats.ucla.edu/stat/sas/library/nesug99/bt150.pdf

Quartz | Level 8

## Equivalent in Sql

Thanks for the reply.But it does not talk about the logic that is used in the if statements.

Obsidian | Level 7

## Equivalent in Sql

page 5 talks about "left join" (in A), "right join" (in B), and "full join" (in A or B).  It doesn't explicitly create the "match" variables though.

Opal | Level 21

## Equivalent in Sql

Using the same example datasets used in the paper, does the following provide a sufficient example?  I am not a sql whiz, thus there may be better ways of doing the 3 types of merges:

DATA STATES;

LENGTH STATE \$ 2 ZIP \$ 5;

INPUT ZIP \$ STATE \$;

CARDS;

08000 NJ

10000 NY

19000 PA

;

RUN;

DATA CITYS;

LENGTH ZIP \$ 5 CITY \$ 15;

INPUT ZIP \$ CITY \$;

CARDS;

10000 NEWYORK

90000 LOSANGELES

;

RUN;

PROC SQL;

SELECT A.ZIP, A.STATE, B.CITY

FROM STATES A, CITYS B

where A.ZIP=B.ZIP

;

quit;

PROC SQL;

SELECT A.ZIP, A.STATE, B.CITY

FROM STATES A left join CITYS B

on A.ZIP=B.ZIP

having missing(b.zip)

;

quit;

PROC SQL;

SELECT A.ZIP, A.STATE, B.CITY

FROM STATES A right join CITYS B

on A.ZIP=B.ZIP

having missing(a.zip)

;

quit;

Rhodochrosite | Level 12

## Re: Equivalent in Sql

Hi ... first, in your data step, I make some changes ...

data merged_data;

by streetnum streetname zip;

if A and B then match = "match ";

else

if A then match = "A only";

else      match = "B only";

run;

As for your question, you can add similar logic to SQL with a CASE statement.  Here's

something like your MERGE that uses a SELECT statement since that looks a bit like

a CASE statement in SQL.  Then there's an SQL equivalent.

data one;

input id age @@;

datalines;

1 10 2 20 3 30

;

run;

data two;

input id zip @@;

datalines;

1 12202 3 23456 4 99901

;

run;

data merg;

merge one (in=a) two(in=b);

by id;

select;

when (a and b) match = "match";

when (a)       match = "aonly";

otherwise      match = "bonly";

end;

run;

proc sql;

create table sqel as

select coalesce (one.id, two.id) as id, age, zip,

case when (one.id and two.id) then "match"

when (one.id) then "aonly"

else "bonly"

end as match

from one full join two

on one.id eq two.id;

quit;

title "MERGE";

proc print data=merg noobs;

run;

title "SQL";

proc print data=sqel noobs;

run;

MERGE

id    age     zip     match

1     10    12202    match

2     20        .    aonly

3     30    23456    match

4      .    99901    bonly

SQL

id    age     zip     match

1     10    12202    match

2     20        .    aonly

3     30    23456    match

4      .    99901    bonly

ps  from Howard Schreier's great book ... PROC SQL by Example: Using SQL within SAS

https://support.sas.com/pubscat/bookdetails.jsp?pc=60500

http://howles.com/sqlbook/

Quartz | Level 8

## Equivalent in Sql

A couple of other techniques:

1. NATURAL joins

2. inline views to add flag columns to each side of the join

PS: Thanks for the plug!

`MikeZdeb wrote:Hi ... first, in your data step, I make some changes ...data merged_data;merge address_A (in = A) address_B (in = B);by streetnum streetname zip;if A and B then match = "match ";elseif A then match = "A only";else      match = "B only";run;As for your question, you can add similar logic to SQL with a CASE statement.  Here'ssomething like your MERGE that uses a SELECT statement since that looks a bit likea CASE statement in SQL.  Then there's an SQL equivalent.data one;input id age @@;datalines;1 10 2 20 3 30;run;data two;input id zip @@;datalines;1 12202 3 23456 4 99901;run; data merg;merge one (in=a) two(in=b);by id;select;   when (a and b) match = "match";   when (a)       match = "aonly";   otherwise      match = "bonly";end;run;proc sql;create table sqel asselect coalesce (one.id, two.id) as id, age, zip,       case when (one.id and two.id) then "match"            when (one.id) then "aonly"            else "bonly"       end as matchfrom one full join twoon one.id eq two.id;quit;title "MERGE";proc print data=merg noobs;run;title "SQL";proc print data=sqel noobs;run;MERGEid    age     zip     match 1     10    12202    match 2     20        .    aonly 3     30    23456    match 4      .    99901    bonlySQLid    age     zip     match 1     10    12202    match 2     20        .    aonly 3     30    23456    match 4      .    99901    bonly ps  from Howard Schreier's great book ... PROC SQL by Example: Using SQL within SAShttps://support.sas.com/pubscat/bookdetails.jsp?pc=60500http://howles.com/sqlbook/`
Calcite | Level 5

## Equivalent in Sql

You should use a full join and use and abuse the "coalesce" function, specially, for any common variable. A solution for your problem would be something like:

proc sql;

SELECT Coalesce(a.streetnum, b.streetnum)   AS streetnum,

Coalesce(a.streetname, b.streetname) AS streetname,

Coalesce(a.zip, b.zip)               AS zip,

CASE

WHEN NOT( a.streetnum IS NULL

AND a.streetname IS NULL

AND a.zip IS NULL )

AND NOT( b.streetnum IS NULL

AND b.streetname IS NULL

AND b.zip IS NULL ) THEN "match"

WHEN NOT( a.streetnum IS NULL

AND a.streetname IS NULL

AND a.zip IS NULL )

AND( b.streetnum IS NULL

AND b.streetname IS NULL

AND b.zip IS NULL ) THEN "A only"

WHEN( a.streetnum IS NULL

AND a.streetname IS NULL

AND a.zip IS NULL )

AND NOT( b.streetnum IS NULL

AND b.streetname IS NULL

AND b.zip IS NULL ) THEN "B only"

ELSE "match"

END                                  AS match

ON ( a.streetnum = b.streetnum

AND a.streetname = b.streetname

AND a.zip = b.zip );

quit;

Super User

## Re: Equivalent in Sql

There are two main differences in the MERGE and the SQL FULL JOIN.

One is the way to check which dataset has contributed data.  The CASE example above is most similar to the IF/THEN logic in your DATA step.

The other is what it does when there are variables with the same name in both datasets.  For this you will need to use the COALESCE function.  Remember to use for both your BY variables and any other other variables that occur in both datasets.

Rhodochrosite | Level 12

## Equivalent in Sql

short answer is YES, but since you have a clearly understandable data step, WHY?

Opal | Level 21

## Equivalent in Sql

Peter,

It never hurts to know and, moreso, if one has to meet any of the three conditions in a many-to-many merge situation, it will be a lot easier to understand than the gyrations one would have to use to get the right result out of a datastep merge.  No?

Rhodochrosite | Level 12

## Equivalent in Sql

Art

you are right of course. . . I was hoping the op might explain as Hobbs has, but fearing it was just some more homework!

I am surprised at the quantity of sql needed and that something like this is not sufficient

select

case

when keyA=keyB then 'match'

when keyB is null then 'A file'

else 'B file'

end length=6 as match

from afile as a

full join bfile as b

;

Calcite | Level 5

## Equivalent in Sql

And this is just half of the work because for each common variable you would need to write a new statement, something like,

Coalesce(b.common_var, a.common_var) AS common_var

Calcite | Level 5

## Equivalent in Sql

As an example,

I had to use this solution many times with SAS Data Integration Studio 3.4 because there wasn't a built in "Merge Tranformation" therefore the most common way of merging datasets was using the SQL Join transformation.

Discussion stats
• 13 replies
• 1773 views
• 0 likes
• 8 in conversation