BookmarkSubscribeRSS Feed
teamlinerek
Fluorite | Level 6

Hi Everyone,

 

I am merging two data sets - A and B;

 

Data Set A

firstname   secondnameinitial    thirdname

John                          A                          Doe

John                          A                          Doe

John                          A                          Doe

James                       B                         Crow

Jim                             C                        Throw

Jonathan                  D                         Brown

 

Data Set B

Forename1    Forename2Initial    Surname         ID

John                           A                         Doe           12345

John                          A                          Doe           67890

James                       B                           Crow         15123

James                       B                          Crow          12518

Jim                            C                          Throw       30987

Jane                          D                          Scott         23484

 

I want matches with all of the information from both tables. I am merging on first name, middle initial, and last name using the following code:

 

proc sql;
create table want as
select * from have a
inner join have b
on (upper(trim(a.firstname))=upper(trim(b.Forename1))) and (upper(trim(a.thirdname))=upper(trim(b.Surname)))
and (upper(trim(a.SecondNameInitial))=upper(trim(b.Forename2Initial)));
quit;

 

My question is this:

My output is about 10,000 names. How do I identify the duplicate matches that were eliminated during the merge process due to identical names? I can manually look at additional information to determine which names should be properly matched.

 

Thank you!

 

 

 

 

2 REPLIES 2
Kurt_Bremser
Super User

You show data for datasets "A" and "B", but in your SQL you do a self join of a dataset "HAVE".

Further, SQL does not eliminate duplicate matches, it creates a cartesian join.

 

Please clarify what you really want, and post the expected result for the data you posted.

 


@teamlinerek wrote:

Hi Everyone,

 

I am merging two data sets - A and B;

 

Data Set A

firstname   secondnameinitial    thirdname

John                          A                          Doe

John                          A                          Doe

John                          A                          Doe

James                       B                         Crow

Jim                             C                        Throw

Jonathan                  D                         Brown

 

Data Set B

Forename1    Forename2Initial    Surname         ID

John                           A                         Doe           12345

John                          A                          Doe           67890

James                       B                           Crow         15123

James                       B                          Crow          12518

Jim                            C                          Throw       30987

Jane                          D                          Scott         23484

 

I want matches with all of the information from both tables. I am merging on first name, middle initial, and last name using the following code:

 

proc sql;
create table want as
select * from have a
inner join have b
on (upper(trim(a.firstname))=upper(trim(b.Forename1))) and (upper(trim(a.thirdname))=upper(trim(b.Surname)))
and (upper(trim(a.SecondNameInitial))=upper(trim(b.Forename2Initial)));
quit;

 

My question is this:

My output is about 10,000 names. How do I identify the duplicate matches that were eliminated during the merge process due to identical names? I can manually look at additional information to determine which names should be properly matched.

 

Thank you!


 

Patrick
Opal | Level 21

I believe the query you want is as below (not tested because you didn't provide SAS code to create such sample data).

proc sql;
  create table want as
  select a.*,b.* 
  from 
    (select distinct firstname,thirdname,SecondNameInitial from table_1) a
    inner join 
    table_2 b
      on 
            (upper(trim(a.firstname))         = upper(trim(b.Forename1))) 
        and (upper(trim(a.thirdname))         = upper(trim(b.Surname)))
        and (upper(trim(a.SecondNameInitial)) = upper(trim(b.Forename2Initial)))
  ;
quit;

 

If you don't first dedup your table 1 then the relationship between the tables is many:many which is almost never a good situation.

SAS Innovate 2025: Save the Date

 SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!

Save the date!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 2 replies
  • 699 views
  • 1 like
  • 3 in conversation