Desktop productivity for business analysts and programmers

Comparing similar names from 2 separate data tables

Reply
Occasional Contributor WM
Occasional Contributor
Posts: 7

Comparing similar names from 2 separate data tables

I have 2 excel worksheets that I've used excel add-in for fuzzy lookup. I have imported both into SAS Enterprise Guide hoping I could figure out a similar approach. I've read about CompGed but not sure it it's what I want to use and if so then getting it coded correctly for what I want it to do.

 

The information I want to compare is named slightly different between the worksheets(Insured Client Name vs  Last Name First Name). I want to  identify which names are closely matched. One thing to note is that the Last Name First Name is a concatenated value performed within the excel worksheet before I imported.

 

I'm looking for suggestions on best way to approach identifying possible matches and then idea of sample code to use. I copied some code I found within this community and inserted what I thought was correct but can't get it to work. It runs but doesn't return any output data.  The code I'm using is;

 

data Insured_Name_Match;

set work.'copy of all agencies 2017 sales'n ;

 

tmp1=soundex('Insured Client Name'n);

do i=1 to nobs;

set WORK.'query_for_dec2017 online illustr'n point=i nobs=nobs;

 

tmp2=soundex('last name first name'n);

dif=compged(tmp1,tmp2);

if dif<=100 then do;

 

possible_match='Yes';

drop i tmp1 tmp2;

output;

end;

end;

run;

 

 

I don't get any error code. here's what the project log says. Here's the error code I get;

 

22 GOPTIONS ACCESSIBLE;

23 data Insured_Name_Match;

24 set work.'copy of all agencies 2017 sales'n ;

25 tmp1=soundex('Insured Client Name'n);

26 do i=1 to nobs;

27 set WORK.'query_for_dec2017 online illustr'n point=i nobs=nobs;

28 tmp2=soundex('last name first name'n);

29 dif=compged(tmp1,tmp2);

30 if dif<=100 then do;

31 possible_match='Yes';

32 drop i tmp1 tmp2;

33 output;

34 end;

35 end;

36 run;

 

NOTE: Numeric values have been converted to character values at the places given by: (Line)Smiley SadColumn).

28:14

NOTE: Variable 'last name first name'n is uninitialized.

NOTE: There were 56521 observations read from the data set WORK.'COPY OF ALL AGENCIES 2017 SALES'n.

NOTE: The data set WORK.INSURED_NAME_MATCH has 0 observations and 28 variables.

NOTE: DATA statement used (Total process time):

real time 43.05 seconds

cpu time 41.82 seconds

 

 

Respected Advisor
Posts: 3,265

Re: Comparing similar names from 2 separate data tables

Occasional Contributor WM
Occasional Contributor
Posts: 7

Re: Comparing similar names from 2 separate data tables

Posted in reply to PaigeMiller

Thanks for your suggestion. I am comparing a list of approx. 5000 names to another list of over 20,000, so don't think 'sounds like' can be used for my situation.

Super User
Posts: 23,998

Re: Comparing similar names from 2 separate data tables


WM wrote:

Thanks for your suggestion. I am comparing a list of approx. 5000 names to another list of over 20,000, so don't think 'sounds like' can be used for my situation.


Did you try it? Did it not work? Why are you assuming it can't work?

 

5000 * 20000 is 100,000,000 comparisons though so any approach is going to take time. 

Occasional Contributor WM
Occasional Contributor
Posts: 7

Re: Comparing similar names from 2 separate data tables

Looking at code it appears I need to include actual name in sounds like code for each name I'm looking for? Am I missing something that would tell it to repeat comparison for each name on the list?

 

Program to Select Names That Sound like 'Johnson'

libname proclib 'SAS-library';
proc sql;
   title "Employees Whose Last Name Sounds Like 'Johnson'";
   select idnum, upcase(lname), fname
      from proclib.staff
where lname=*"Johnson"
      order by 2;
Super User
Posts: 23,998

Re: Comparing similar names from 2 separate data tables

The other side of the comparison is another variable from your other table.

 

where lname =* other_variable_name
Occasional Contributor WM
Occasional Contributor
Posts: 7

Re: Comparing similar names from 2 separate data tables

Here's code I'm using. I'm just learning so am not following how to identify\select name variables from two different data sets. Does this look correct?

 

proc sql;

Select 'Insured Client Name'n from work.'copy of all agencies 2017 sales'n;

select 'last_name_first_name'n from work.'query_for_dec2017 online illustr'n;

where 'Insured Client Name'n =* 'last_name_first_name'n;

run;

 

The code runs but only returns 1 column for Insured Client Name. Also the log summary shows this;

 

22 GOPTIONS ACCESSIBLE;

23 proc sql;

24 Select 'Insured Client Name'n from work.'copy of all agencies 2017 sales'n;

25 select 'last_name_first_name'n from work.'query_for_dec2017 online illustr'n;

26 where 'Insured Client Name'n =* 'last_name_first_name'n;

 

WARNING: This SAS global statement is not supported in PROC SQL. It has been ignored.

 

Super User
Posts: 23,998

Re: Comparing similar names from 2 separate data tables

No, you're not creating a table for starters and I doubt you want to work with the data from the ODS listing. 

You also need a join of some kind, even if its on itself.

 

%*sounds like comparison;
proc sql; create table want as select t1.name as original_name, t2.name as matched_name from sashelp.class as t1 cross join sashelp.class as t2 where t1.name =* t2.name; quit; %*COMPGED comparison; proc sql; create table want2 as select t1.name as original_name, t2.name as matched_name, compged(t1.name, t2.name) as distance from sashelp.class as t1 cross join sashelp.class as t2 order by t1.name, distance ; quit;
Super User
Posts: 23,998

Re: Comparing similar names from 2 separate data tables

Look up the Link King - it’s a free linkage tool based in SAS. The code is available somewhere but you need to dig a bit.
Occasional Contributor WM
Occasional Contributor
Posts: 7

Re: Comparing similar names from 2 separate data tables

To my knowledge this is not an approved download for company I work for. Will need to stick with using Compged.

Super User
Posts: 23,998

Re: Comparing similar names from 2 separate data tables

The SAS program is a text file available somewhere assuming you're referring to my reference to The Link King. 

 

 

Contributor
Posts: 29

Re: Comparing similar names from 2 separate data tables

Log said variable 'last name first name' does not exist. Check it in dataset.

Occasional Contributor WM
Occasional Contributor
Posts: 7

Re: Comparing similar names from 2 separate data tables

OK thanks. Fixed that reference.

Ask a Question
Discussion stats
  • 12 replies
  • 218 views
  • 0 likes
  • 4 in conversation