Hi Everyone,
I am merging two data sets - A and B;
Data Set A
firstname secondnameinitial thirdname
John A Doe
John A Doe
John A Doe
James B Crow
Jim C Throw
Jonathan D Brown
Data Set B
Forename1 Forename2Initial Surname ID
John A Doe 12345
John A Doe 67890
James B Crow 15123
James B Crow 12518
Jim C Throw 30987
Jane D Scott 23484
I want matches with all of the information from both tables. I am merging on first name, middle initial, and last name using the following code:
proc sql;
create table want as
select * from have a
inner join have b
on (upper(trim(a.firstname))=upper(trim(b.Forename1))) and (upper(trim(a.thirdname))=upper(trim(b.Surname)))
and (upper(trim(a.SecondNameInitial))=upper(trim(b.Forename2Initial)));
quit;
My question is this:
My output is about 10,000 names. How do I identify the duplicate matches that were eliminated during the merge process due to identical names? I can manually look at additional information to determine which names should be properly matched.
Thank you!
You show data for datasets "A" and "B", but in your SQL you do a self join of a dataset "HAVE".
Further, SQL does not eliminate duplicate matches, it creates a cartesian join.
Please clarify what you really want, and post the expected result for the data you posted.
@teamlinerek wrote:
Hi Everyone,
I am merging two data sets - A and B;
Data Set A
firstname secondnameinitial thirdname
John A Doe
John A Doe
John A Doe
James B Crow
Jim C Throw
Jonathan D Brown
Data Set B
Forename1 Forename2Initial Surname ID
John A Doe 12345
John A Doe 67890
James B Crow 15123
James B Crow 12518
Jim C Throw 30987
Jane D Scott 23484
I want matches with all of the information from both tables. I am merging on first name, middle initial, and last name using the following code:
proc sql;
create table want as
select * from have a
inner join have b
on (upper(trim(a.firstname))=upper(trim(b.Forename1))) and (upper(trim(a.thirdname))=upper(trim(b.Surname)))
and (upper(trim(a.SecondNameInitial))=upper(trim(b.Forename2Initial)));
quit;
My question is this:
My output is about 10,000 names. How do I identify the duplicate matches that were eliminated during the merge process due to identical names? I can manually look at additional information to determine which names should be properly matched.
Thank you!
I believe the query you want is as below (not tested because you didn't provide SAS code to create such sample data).
proc sql;
create table want as
select a.*,b.*
from
(select distinct firstname,thirdname,SecondNameInitial from table_1) a
inner join
table_2 b
on
(upper(trim(a.firstname)) = upper(trim(b.Forename1)))
and (upper(trim(a.thirdname)) = upper(trim(b.Surname)))
and (upper(trim(a.SecondNameInitial)) = upper(trim(b.Forename2Initial)))
;
quit;
If you don't first dedup your table 1 then the relationship between the tables is many:many which is almost never a good situation.
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.