Hi Everyone,
I am merging two data sets - A and B;
Data Set A
firstname secondnameinitial thirdname
John A Doe
John A Doe
John A Doe
James B Crow
Jim C Throw
Jonathan D Brown
Data Set B
Forename1 Forename2Initial Surname ID
John A Doe 12345
John A Doe 67890
James B Crow 15123
James B Crow 12518
Jim C Throw 30987
Jane D Scott 23484
I want matches with all of the information from both tables. I am merging on first name, middle initial, and last name using the following code:
proc sql;
create table want as
select * from have a
inner join have b
on (upper(trim(a.firstname))=upper(trim(b.Forename1))) and (upper(trim(a.thirdname))=upper(trim(b.Surname)))
and (upper(trim(a.SecondNameInitial))=upper(trim(b.Forename2Initial)));
quit;
My question is this:
My output is about 10,000 names. How do I identify the duplicate matches that were eliminated during the merge process due to identical names? I can manually look at additional information to determine which names should be properly matched.
Thank you!
You show data for datasets "A" and "B", but in your SQL you do a self join of a dataset "HAVE".
Further, SQL does not eliminate duplicate matches, it creates a cartesian join.
Please clarify what you really want, and post the expected result for the data you posted.
@teamlinerek wrote:
Hi Everyone,
I am merging two data sets - A and B;
Data Set A
firstname secondnameinitial thirdname
John A Doe
John A Doe
John A Doe
James B Crow
Jim C Throw
Jonathan D Brown
Data Set B
Forename1 Forename2Initial Surname ID
John A Doe 12345
John A Doe 67890
James B Crow 15123
James B Crow 12518
Jim C Throw 30987
Jane D Scott 23484
I want matches with all of the information from both tables. I am merging on first name, middle initial, and last name using the following code:
proc sql;
create table want as
select * from have a
inner join have b
on (upper(trim(a.firstname))=upper(trim(b.Forename1))) and (upper(trim(a.thirdname))=upper(trim(b.Surname)))
and (upper(trim(a.SecondNameInitial))=upper(trim(b.Forename2Initial)));
quit;
My question is this:
My output is about 10,000 names. How do I identify the duplicate matches that were eliminated during the merge process due to identical names? I can manually look at additional information to determine which names should be properly matched.
Thank you!
I believe the query you want is as below (not tested because you didn't provide SAS code to create such sample data).
proc sql;
create table want as
select a.*,b.*
from
(select distinct firstname,thirdname,SecondNameInitial from table_1) a
inner join
table_2 b
on
(upper(trim(a.firstname)) = upper(trim(b.Forename1)))
and (upper(trim(a.thirdname)) = upper(trim(b.Surname)))
and (upper(trim(a.SecondNameInitial)) = upper(trim(b.Forename2Initial)))
;
quit;
If you don't first dedup your table 1 then the relationship between the tables is many:many which is almost never a good situation.
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.