DATA Step, Macro, Functions and more

Equivalent in Sql

Reply
Super Contributor
Posts: 647

Equivalent in Sql

How can the following be done in sql?

data merged_data;

merge address_A (in = A) address_B (in = B);
  by streetnum streetname zip;
  if A and not B then match = "A only";
  if B and not A then match = "B only";
  if A and B then match = "match";
run;

PROC Star
Posts: 7,363

Equivalent in Sql

There are a number of papers that provide answers to your three questions.  E.g., take a look at:

http://www.ats.ucla.edu/stat/sas/library/nesug99/bt150.pdf

Super Contributor
Posts: 647

Equivalent in Sql

Thanks for the reply.But it does not talk about the logic that is used in the if statements.

Frequent Contributor
Posts: 104

Equivalent in Sql

page 5 talks about "left join" (in A), "right join" (in B), and "full join" (in A or B).  It doesn't explicitly create the "match" variables though.

PROC Star
Posts: 7,363

Equivalent in Sql

Using the same example datasets used in the paper, does the following provide a sufficient example?  I am not a sql whiz, thus there may be better ways of doing the 3 types of merges:

DATA STATES;

  LENGTH STATE $ 2 ZIP $ 5;

  INPUT ZIP $ STATE $;

  CARDS;

08000 NJ

10000 NY

19000 PA

;

RUN;

DATA CITYS;

  LENGTH ZIP $ 5 CITY $ 15;

  INPUT ZIP $ CITY $;

  CARDS;

10000 NEWYORK

19000 PHILADELPHIA

90000 LOSANGELES

;

RUN;

PROC SQL;

  SELECT A.ZIP, A.STATE, B.CITY

    FROM STATES A, CITYS B

      where A.ZIP=B.ZIP

  ;

quit;

PROC SQL;

  SELECT A.ZIP, A.STATE, B.CITY

    FROM STATES A left join CITYS B

      on A.ZIP=B.ZIP

        having missing(b.zip)

;

quit;

PROC SQL;

  SELECT A.ZIP, A.STATE, B.CITY

    FROM STATES A right join CITYS B

      on A.ZIP=B.ZIP

        having missing(a.zip)

;

quit;

Valued Guide
Posts: 765

Re: Equivalent in Sql

Hi ... first, in your data step, I make some changes ...

data merged_data;

merge address_A (in = A) address_B (in = B);

by streetnum streetname zip;

if A and B then match = "match ";

else

if A then match = "A only";

else      match = "B only";

run;

As for your question, you can add similar logic to SQL with a CASE statement.  Here's

something like your MERGE that uses a SELECT statement since that looks a bit like

a CASE statement in SQL.  Then there's an SQL equivalent.

data one;

input id age @@;

datalines;

1 10 2 20 3 30

;

run;

data two;

input id zip @@;

datalines;

1 12202 3 23456 4 99901

;

run;

data merg;

merge one (in=a) two(in=b);

by id;

select;

   when (a and b) match = "match";

   when (a)       match = "aonly";

   otherwise      match = "bonly";

end;

run;

proc sql;

create table sqel as

select coalesce (one.id, two.id) as id, age, zip,

       case when (one.id and two.id) then "match"

            when (one.id) then "aonly"

            else "bonly"

       end as match

from one full join two

on one.id eq two.id;

quit;

title "MERGE";

proc print data=merg noobs;

run;

title "SQL";

proc print data=sqel noobs;

run;

MERGE

id    age     zip     match

1     10    12202    match

2     20        .    aonly

3     30    23456    match

4      .    99901    bonly

SQL

id    age     zip     match

1     10    12202    match

2     20        .    aonly

3     30    23456    match

4      .    99901    bonly

ps  from Howard Schreier's great book ... PROC SQL by Example: Using SQL within SAS

https://support.sas.com/pubscat/bookdetails.jsp?pc=60500

http://howles.com/sqlbook/

Regular Contributor
Posts: 184

Equivalent in Sql

A couple of other techniques:

1. NATURAL joins

2. inline views to add flag columns to each side of the join

PS: Thanks for the plug!

MikeZdeb wrote:

Hi ... first, in your data step, I make some changes ...

data merged_data;

merge address_A (in = A) address_B (in = B);

by streetnum streetname zip;

if A and B then match = "match ";

else

if A then match = "A only";

else      match = "B only";

run;

As for your question, you can add similar logic to SQL with a CASE statement.  Here's

something like your MERGE that uses a SELECT statement since that looks a bit like

a CASE statement in SQL.  Then there's an SQL equivalent.

data one;

input id age @@;

datalines;

1 10 2 20 3 30

;

run;

data two;

input id zip @@;

datalines;

1 12202 3 23456 4 99901

;

run;

data merg;

merge one (in=a) two(in=b);

by id;

select;

   when (a and b) match = "match";

   when (a)       match = "aonly";

   otherwise      match = "bonly";

end;

run;

proc sql;

create table sqel as

select coalesce (one.id, two.id) as id, age, zip,

       case when (one.id and two.id) then "match"

            when (one.id) then "aonly"

            else "bonly"

       end as match

from one full join two

on one.id eq two.id;

quit;

title "MERGE";

proc print data=merg noobs;

run;

title "SQL";

proc print data=sqel noobs;

run;

MERGE

id    age     zip     match

1     10    12202    match

2     20        .    aonly

3     30    23456    match

4      .    99901    bonly

SQL

id    age     zip     match

1     10    12202    match

2     20        .    aonly

3     30    23456    match

4      .    99901    bonly

ps  from Howard Schreier's great book ... PROC SQL by Example: Using SQL within SAS

https://support.sas.com/pubscat/bookdetails.jsp?pc=60500

http://howles.com/sqlbook/

Occasional Contributor
Posts: 16

Equivalent in Sql

You should use a full join and use and abuse the "coalesce" function, specially, for any common variable. A solution for your problem would be something like:

proc sql;

      SELECT Coalesce(a.streetnum, b.streetnum)   AS streetnum,

             Coalesce(a.streetname, b.streetname) AS streetname,

             Coalesce(a.zip, b.zip)               AS zip,

             CASE

               WHEN NOT( a.streetnum IS NULL

                         AND a.streetname IS NULL

                         AND a.zip IS NULL )

                    AND NOT( b.streetnum IS NULL

                             AND b.streetname IS NULL

                             AND b.zip IS NULL ) THEN "match"

               WHEN NOT( a.streetnum IS NULL

                         AND a.streetname IS NULL

                         AND a.zip IS NULL )

                    AND( b.streetnum IS NULL

                          AND b.streetname IS NULL

                          AND b.zip IS NULL ) THEN "A only"

               WHEN( a.streetnum IS NULL

                      AND a.streetname IS NULL

                      AND a.zip IS NULL )

                    AND NOT( b.streetnum IS NULL

                             AND b.streetname IS NULL

                             AND b.zip IS NULL ) THEN "B only"

               ELSE "match"

             END                                  AS match

      FROM   address_a AS a

             FULL JOIN address_b AS b

               ON ( a.streetnum = b.streetnum

                    AND a.streetname = b.streetname

                    AND a.zip = b.zip );

quit;


Super User
Super User
Posts: 6,502

Re: Equivalent in Sql

There are two main differences in the MERGE and the SQL FULL JOIN.

One is the way to check which dataset has contributed data.  The CASE example above is most similar to the IF/THEN logic in your DATA step.

The other is what it does when there are variables with the same name in both datasets.  For this you will need to use the COALESCE function.  Remember to use for both your BY variables and any other other variables that occur in both datasets.

Valued Guide
Posts: 2,175

Equivalent in Sql

short answer is YES, but since you have a clearly understandable data step, WHY?

PROC Star
Posts: 7,363

Equivalent in Sql

Peter,

It never hurts to know and, moreso, if one has to meet any of the three conditions in a many-to-many merge situation, it will be a lot easier to understand than the gyrations one would have to use to get the right result out of a datastep merge.  No?

Valued Guide
Posts: 2,175

Equivalent in Sql

Art

you are right of course. . . I was hoping the op might explain as Hobbs has, but fearing it was just some more homework!

I am surprised at the quantity of sql needed and that something like this is not sufficient

select

case

when keyA=keyB then 'match'

when keyB is null then 'A file'

else 'B file'

end length=6 as match

from afile as a

full join bfile as b

;

Occasional Contributor
Posts: 16

Equivalent in Sql

And this is just half of the work because for each common variable you would need to write a new statement, something like,

Coalesce(b.common_var, a.common_var) AS common_var

Occasional Contributor
Posts: 16

Equivalent in Sql

As an example,

I had to use this solution many times with SAS Data Integration Studio 3.4 because there wasn't a built in "Merge Tranformation" therefore the most common way of merging datasets was using the SQL Join transformation.

Ask a Question
Discussion stats
  • 13 replies
  • 324 views
  • 0 likes
  • 8 in conversation