BookmarkSubscribeRSS Feed
teamlinerek
Fluorite | Level 6

Hi Everyone,

 

I am merging two data sets - A and B;

 

Data Set A

firstname   secondnameinitial    thirdname

John                          A                          Doe

John                          A                          Doe

John                          A                          Doe

James                       B                         Crow

Jim                             C                        Throw

Jonathan                  D                         Brown

 

Data Set B

Forename1    Forename2Initial    Surname         ID

John                           A                         Doe           12345

John                          A                          Doe           67890

James                       B                           Crow         15123

James                       B                          Crow          12518

Jim                            C                          Throw       30987

Jane                          D                          Scott         23484

 

I want matches with all of the information from both tables. I am merging on first name, middle initial, and last name using the following code:

 

proc sql;
create table want as
select * from have a
inner join have b
on (upper(trim(a.firstname))=upper(trim(b.Forename1))) and (upper(trim(a.thirdname))=upper(trim(b.Surname)))
and (upper(trim(a.SecondNameInitial))=upper(trim(b.Forename2Initial)));
quit;

 

My question is this:

My output is about 10,000 names. How do I identify the duplicate matches that were eliminated during the merge process due to identical names? I can manually look at additional information to determine which names should be properly matched.

 

Thank you!

 

 

 

 

2 REPLIES 2
Kurt_Bremser
Super User

You show data for datasets "A" and "B", but in your SQL you do a self join of a dataset "HAVE".

Further, SQL does not eliminate duplicate matches, it creates a cartesian join.

 

Please clarify what you really want, and post the expected result for the data you posted.

 


@teamlinerek wrote:

Hi Everyone,

 

I am merging two data sets - A and B;

 

Data Set A

firstname   secondnameinitial    thirdname

John                          A                          Doe

John                          A                          Doe

John                          A                          Doe

James                       B                         Crow

Jim                             C                        Throw

Jonathan                  D                         Brown

 

Data Set B

Forename1    Forename2Initial    Surname         ID

John                           A                         Doe           12345

John                          A                          Doe           67890

James                       B                           Crow         15123

James                       B                          Crow          12518

Jim                            C                          Throw       30987

Jane                          D                          Scott         23484

 

I want matches with all of the information from both tables. I am merging on first name, middle initial, and last name using the following code:

 

proc sql;
create table want as
select * from have a
inner join have b
on (upper(trim(a.firstname))=upper(trim(b.Forename1))) and (upper(trim(a.thirdname))=upper(trim(b.Surname)))
and (upper(trim(a.SecondNameInitial))=upper(trim(b.Forename2Initial)));
quit;

 

My question is this:

My output is about 10,000 names. How do I identify the duplicate matches that were eliminated during the merge process due to identical names? I can manually look at additional information to determine which names should be properly matched.

 

Thank you!


 

Patrick
Opal | Level 21

I believe the query you want is as below (not tested because you didn't provide SAS code to create such sample data).

proc sql;
  create table want as
  select a.*,b.* 
  from 
    (select distinct firstname,thirdname,SecondNameInitial from table_1) a
    inner join 
    table_2 b
      on 
            (upper(trim(a.firstname))         = upper(trim(b.Forename1))) 
        and (upper(trim(a.thirdname))         = upper(trim(b.Surname)))
        and (upper(trim(a.SecondNameInitial)) = upper(trim(b.Forename2Initial)))
  ;
quit;

 

If you don't first dedup your table 1 then the relationship between the tables is many:many which is almost never a good situation.

sas-innovate-white.png

Special offer for SAS Communities members

Save $250 on SAS Innovate and get a free advance copy of the new SAS For Dummies book! Use the code "SASforDummies" to register. Don't miss out, May 6-9, in Orlando, Florida.

 

View the full agenda.

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 2 replies
  • 909 views
  • 1 like
  • 3 in conversation