BookmarkSubscribeRSS Feed
SASPhile
Quartz | Level 8

How can the following be done in sql?

data merged_data;

merge address_A (in = A) address_B (in = B);
  by streetnum streetname zip;
  if A and not B then match = "A only";
  if B and not A then match = "B only";
  if A and B then match = "match";
run;

13 REPLIES 13
art297
Opal | Level 21

There are a number of papers that provide answers to your three questions.  E.g., take a look at:

http://www.ats.ucla.edu/stat/sas/library/nesug99/bt150.pdf

SASPhile
Quartz | Level 8

Thanks for the reply.But it does not talk about the logic that is used in the if statements.

DLing
Obsidian | Level 7

page 5 talks about "left join" (in A), "right join" (in B), and "full join" (in A or B).  It doesn't explicitly create the "match" variables though.

art297
Opal | Level 21

Using the same example datasets used in the paper, does the following provide a sufficient example?  I am not a sql whiz, thus there may be better ways of doing the 3 types of merges:

DATA STATES;

  LENGTH STATE $ 2 ZIP $ 5;

  INPUT ZIP $ STATE $;

  CARDS;

08000 NJ

10000 NY

19000 PA

;

RUN;

DATA CITYS;

  LENGTH ZIP $ 5 CITY $ 15;

  INPUT ZIP $ CITY $;

  CARDS;

10000 NEWYORK

19000 PHILADELPHIA

90000 LOSANGELES

;

RUN;

PROC SQL;

  SELECT A.ZIP, A.STATE, B.CITY

    FROM STATES A, CITYS B

      where A.ZIP=B.ZIP

  ;

quit;

PROC SQL;

  SELECT A.ZIP, A.STATE, B.CITY

    FROM STATES A left join CITYS B

      on A.ZIP=B.ZIP

        having missing(b.zip)

;

quit;

PROC SQL;

  SELECT A.ZIP, A.STATE, B.CITY

    FROM STATES A right join CITYS B

      on A.ZIP=B.ZIP

        having missing(a.zip)

;

quit;

MikeZdeb
Rhodochrosite | Level 12

Hi ... first, in your data step, I make some changes ...

data merged_data;

merge address_A (in = A) address_B (in = B);

by streetnum streetname zip;

if A and B then match = "match ";

else

if A then match = "A only";

else      match = "B only";

run;

As for your question, you can add similar logic to SQL with a CASE statement.  Here's

something like your MERGE that uses a SELECT statement since that looks a bit like

a CASE statement in SQL.  Then there's an SQL equivalent.

data one;

input id age @@;

datalines;

1 10 2 20 3 30

;

run;

data two;

input id zip @@;

datalines;

1 12202 3 23456 4 99901

;

run;

data merg;

merge one (in=a) two(in=b);

by id;

select;

   when (a and b) match = "match";

   when (a)       match = "aonly";

   otherwise      match = "bonly";

end;

run;

proc sql;

create table sqel as

select coalesce (one.id, two.id) as id, age, zip,

       case when (one.id and two.id) then "match"

            when (one.id) then "aonly"

            else "bonly"

       end as match

from one full join two

on one.id eq two.id;

quit;

title "MERGE";

proc print data=merg noobs;

run;

title "SQL";

proc print data=sqel noobs;

run;

MERGE

id    age     zip     match

1     10    12202    match

2     20        .    aonly

3     30    23456    match

4      .    99901    bonly

SQL

id    age     zip     match

1     10    12202    match

2     20        .    aonly

3     30    23456    match

4      .    99901    bonly

ps  from Howard Schreier's great book ... PROC SQL by Example: Using SQL within SAS

https://support.sas.com/pubscat/bookdetails.jsp?pc=60500

http://howles.com/sqlbook/

Howles
Quartz | Level 8

A couple of other techniques:

1. NATURAL joins

2. inline views to add flag columns to each side of the join

PS: Thanks for the plug!

MikeZdeb wrote:

Hi ... first, in your data step, I make some changes ...

data merged_data;

merge address_A (in = A) address_B (in = B);

by streetnum streetname zip;

if A and B then match = "match ";

else

if A then match = "A only";

else      match = "B only";

run;

As for your question, you can add similar logic to SQL with a CASE statement.  Here's

something like your MERGE that uses a SELECT statement since that looks a bit like

a CASE statement in SQL.  Then there's an SQL equivalent.

data one;

input id age @@;

datalines;

1 10 2 20 3 30

;

run;

data two;

input id zip @@;

datalines;

1 12202 3 23456 4 99901

;

run;

data merg;

merge one (in=a) two(in=b);

by id;

select;

   when (a and b) match = "match";

   when (a)       match = "aonly";

   otherwise      match = "bonly";

end;

run;

proc sql;

create table sqel as

select coalesce (one.id, two.id) as id, age, zip,

       case when (one.id and two.id) then "match"

            when (one.id) then "aonly"

            else "bonly"

       end as match

from one full join two

on one.id eq two.id;

quit;

title "MERGE";

proc print data=merg noobs;

run;

title "SQL";

proc print data=sqel noobs;

run;

MERGE

id    age     zip     match

1     10    12202    match

2     20        .    aonly

3     30    23456    match

4      .    99901    bonly

SQL

id    age     zip     match

1     10    12202    match

2     20        .    aonly

3     30    23456    match

4      .    99901    bonly

ps  from Howard Schreier's great book ... PROC SQL by Example: Using SQL within SAS

https://support.sas.com/pubscat/bookdetails.jsp?pc=60500

http://howles.com/sqlbook/

Hobbes
Calcite | Level 5

You should use a full join and use and abuse the "coalesce" function, specially, for any common variable. A solution for your problem would be something like:

proc sql;

      SELECT Coalesce(a.streetnum, b.streetnum)   AS streetnum,

             Coalesce(a.streetname, b.streetname) AS streetname,

             Coalesce(a.zip, b.zip)               AS zip,

             CASE

               WHEN NOT( a.streetnum IS NULL

                         AND a.streetname IS NULL

                         AND a.zip IS NULL )

                    AND NOT( b.streetnum IS NULL

                             AND b.streetname IS NULL

                             AND b.zip IS NULL ) THEN "match"

               WHEN NOT( a.streetnum IS NULL

                         AND a.streetname IS NULL

                         AND a.zip IS NULL )

                    AND( b.streetnum IS NULL

                          AND b.streetname IS NULL

                          AND b.zip IS NULL ) THEN "A only"

               WHEN( a.streetnum IS NULL

                      AND a.streetname IS NULL

                      AND a.zip IS NULL )

                    AND NOT( b.streetnum IS NULL

                             AND b.streetname IS NULL

                             AND b.zip IS NULL ) THEN "B only"

               ELSE "match"

             END                                  AS match

      FROM   address_a AS a

             FULL JOIN address_b AS b

               ON ( a.streetnum = b.streetnum

                    AND a.streetname = b.streetname

                    AND a.zip = b.zip );

quit;


Tom
Super User Tom
Super User

There are two main differences in the MERGE and the SQL FULL JOIN.

One is the way to check which dataset has contributed data.  The CASE example above is most similar to the IF/THEN logic in your DATA step.

The other is what it does when there are variables with the same name in both datasets.  For this you will need to use the COALESCE function.  Remember to use for both your BY variables and any other other variables that occur in both datasets.

Peter_C
Rhodochrosite | Level 12

short answer is YES, but since you have a clearly understandable data step, WHY?

art297
Opal | Level 21

Peter,

It never hurts to know and, moreso, if one has to meet any of the three conditions in a many-to-many merge situation, it will be a lot easier to understand than the gyrations one would have to use to get the right result out of a datastep merge.  No?

Peter_C
Rhodochrosite | Level 12

Art

you are right of course. . . I was hoping the op might explain as Hobbs has, but fearing it was just some more homework!

I am surprised at the quantity of sql needed and that something like this is not sufficient

select

case

when keyA=keyB then 'match'

when keyB is null then 'A file'

else 'B file'

end length=6 as match

from afile as a

full join bfile as b

;

Hobbes
Calcite | Level 5

And this is just half of the work because for each common variable you would need to write a new statement, something like,

Coalesce(b.common_var, a.common_var) AS common_var

Hobbes
Calcite | Level 5

As an example,

I had to use this solution many times with SAS Data Integration Studio 3.4 because there wasn't a built in "Merge Tranformation" therefore the most common way of merging datasets was using the SQL Join transformation.

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 13 replies
  • 1510 views
  • 0 likes
  • 8 in conversation