BookmarkSubscribeRSS Feed
teamlinerek
Fluorite | Level 6

Hi Everyone,

 

I am merging two data sets - A and B;

 

Data Set A

firstname   secondnameinitial    thirdname

John                          A                          Doe

John                          A                          Doe

John                          A                          Doe

James                       B                         Crow

Jim                             C                        Throw

Jonathan                  D                         Brown

 

Data Set B

Forename1    Forename2Initial    Surname         ID

John                           A                         Doe           12345

John                          A                          Doe           67890

James                       B                           Crow         15123

James                       B                          Crow          12518

Jim                            C                          Throw       30987

Jane                          D                          Scott         23484

 

I want matches with all of the information from both tables. I am merging on first name, middle initial, and last name using the following code:

 

proc sql;
create table want as
select * from have a
inner join have b
on (upper(trim(a.firstname))=upper(trim(b.Forename1))) and (upper(trim(a.thirdname))=upper(trim(b.Surname)))
and (upper(trim(a.SecondNameInitial))=upper(trim(b.Forename2Initial)));
quit;

 

My question is this:

My output is about 10,000 names. How do I identify the duplicate matches that were eliminated during the merge process due to identical names? I can manually look at additional information to determine which names should be properly matched.

 

Thank you!

 

 

 

 

2 REPLIES 2
Kurt_Bremser
Super User

You show data for datasets "A" and "B", but in your SQL you do a self join of a dataset "HAVE".

Further, SQL does not eliminate duplicate matches, it creates a cartesian join.

 

Please clarify what you really want, and post the expected result for the data you posted.

 


@teamlinerek wrote:

Hi Everyone,

 

I am merging two data sets - A and B;

 

Data Set A

firstname   secondnameinitial    thirdname

John                          A                          Doe

John                          A                          Doe

John                          A                          Doe

James                       B                         Crow

Jim                             C                        Throw

Jonathan                  D                         Brown

 

Data Set B

Forename1    Forename2Initial    Surname         ID

John                           A                         Doe           12345

John                          A                          Doe           67890

James                       B                           Crow         15123

James                       B                          Crow          12518

Jim                            C                          Throw       30987

Jane                          D                          Scott         23484

 

I want matches with all of the information from both tables. I am merging on first name, middle initial, and last name using the following code:

 

proc sql;
create table want as
select * from have a
inner join have b
on (upper(trim(a.firstname))=upper(trim(b.Forename1))) and (upper(trim(a.thirdname))=upper(trim(b.Surname)))
and (upper(trim(a.SecondNameInitial))=upper(trim(b.Forename2Initial)));
quit;

 

My question is this:

My output is about 10,000 names. How do I identify the duplicate matches that were eliminated during the merge process due to identical names? I can manually look at additional information to determine which names should be properly matched.

 

Thank you!


 

Patrick
Opal | Level 21

I believe the query you want is as below (not tested because you didn't provide SAS code to create such sample data).

proc sql;
  create table want as
  select a.*,b.* 
  from 
    (select distinct firstname,thirdname,SecondNameInitial from table_1) a
    inner join 
    table_2 b
      on 
            (upper(trim(a.firstname))         = upper(trim(b.Forename1))) 
        and (upper(trim(a.thirdname))         = upper(trim(b.Surname)))
        and (upper(trim(a.SecondNameInitial)) = upper(trim(b.Forename2Initial)))
  ;
quit;

 

If you don't first dedup your table 1 then the relationship between the tables is many:many which is almost never a good situation.

Ready to join fellow brilliant minds for the SAS Hackathon?

Build your skills. Make connections. Enjoy creative freedom. Maybe change the world. Registration is now open through August 30th. Visit the SAS Hackathon homepage.

Register today!
How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 2 replies
  • 534 views
  • 1 like
  • 3 in conversation