SAS Enterprise Guide

Desktop productivity for business analysts and programmers
BookmarkSubscribeRSS Feed
WM
Calcite | Level 5 WM
Calcite | Level 5

I have 2 excel worksheets that I've used excel add-in for fuzzy lookup. I have imported both into SAS Enterprise Guide hoping I could figure out a similar approach. I've read about CompGed but not sure it it's what I want to use and if so then getting it coded correctly for what I want it to do.

 

The information I want to compare is named slightly different between the worksheets(Insured Client Name vs  Last Name First Name). I want to  identify which names are closely matched. One thing to note is that the Last Name First Name is a concatenated value performed within the excel worksheet before I imported.

 

I'm looking for suggestions on best way to approach identifying possible matches and then idea of sample code to use. I copied some code I found within this community and inserted what I thought was correct but can't get it to work. It runs but doesn't return any output data.  The code I'm using is;

 

data Insured_Name_Match;

set work.'copy of all agencies 2017 sales'n ;

 

tmp1=soundex('Insured Client Name'n);

do i=1 to nobs;

set WORK.'query_for_dec2017 online illustr'n point=i nobs=nobs;

 

tmp2=soundex('last name first name'n);

dif=compged(tmp1,tmp2);

if dif<=100 then do;

 

possible_match='Yes';

drop i tmp1 tmp2;

output;

end;

end;

run;

 

 

I don't get any error code. here's what the project log says. Here's the error code I get;

 

22 GOPTIONS ACCESSIBLE;

23 data Insured_Name_Match;

24 set work.'copy of all agencies 2017 sales'n ;

25 tmp1=soundex('Insured Client Name'n);

26 do i=1 to nobs;

27 set WORK.'query_for_dec2017 online illustr'n point=i nobs=nobs;

28 tmp2=soundex('last name first name'n);

29 dif=compged(tmp1,tmp2);

30 if dif<=100 then do;

31 possible_match='Yes';

32 drop i tmp1 tmp2;

33 output;

34 end;

35 end;

36 run;

 

NOTE: Numeric values have been converted to character values at the places given by: (Line):(Column).

28:14

NOTE: Variable 'last name first name'n is uninitialized.

NOTE: There were 56521 observations read from the data set WORK.'COPY OF ALL AGENCIES 2017 SALES'n.

NOTE: The data set WORK.INSURED_NAME_MATCH has 0 observations and 28 variables.

NOTE: DATA statement used (Total process time):

real time 43.05 seconds

cpu time 41.82 seconds

 

 

12 REPLIES 12
WM
Calcite | Level 5 WM
Calcite | Level 5

Thanks for your suggestion. I am comparing a list of approx. 5000 names to another list of over 20,000, so don't think 'sounds like' can be used for my situation.

Reeza
Super User

@WM wrote:

Thanks for your suggestion. I am comparing a list of approx. 5000 names to another list of over 20,000, so don't think 'sounds like' can be used for my situation.


Did you try it? Did it not work? Why are you assuming it can't work?

 

5000 * 20000 is 100,000,000 comparisons though so any approach is going to take time. 

WM
Calcite | Level 5 WM
Calcite | Level 5

Looking at code it appears I need to include actual name in sounds like code for each name I'm looking for? Am I missing something that would tell it to repeat comparison for each name on the list?

 

Program to Select Names That Sound like 'Johnson'

libname proclib 'SAS-library';
proc sql;
   title "Employees Whose Last Name Sounds Like 'Johnson'";
   select idnum, upcase(lname), fname
      from proclib.staff
where lname=*"Johnson"
      order by 2;
Reeza
Super User

The other side of the comparison is another variable from your other table.

 

where lname =* other_variable_name
WM
Calcite | Level 5 WM
Calcite | Level 5

Here's code I'm using. I'm just learning so am not following how to identify\select name variables from two different data sets. Does this look correct?

 

proc sql;

Select 'Insured Client Name'n from work.'copy of all agencies 2017 sales'n;

select 'last_name_first_name'n from work.'query_for_dec2017 online illustr'n;

where 'Insured Client Name'n =* 'last_name_first_name'n;

run;

 

The code runs but only returns 1 column for Insured Client Name. Also the log summary shows this;

 

22 GOPTIONS ACCESSIBLE;

23 proc sql;

24 Select 'Insured Client Name'n from work.'copy of all agencies 2017 sales'n;

25 select 'last_name_first_name'n from work.'query_for_dec2017 online illustr'n;

26 where 'Insured Client Name'n =* 'last_name_first_name'n;

 

WARNING: This SAS global statement is not supported in PROC SQL. It has been ignored.

 

Reeza
Super User

No, you're not creating a table for starters and I doubt you want to work with the data from the ODS listing. 

You also need a join of some kind, even if its on itself.

 

%*sounds like comparison;proc sql;
create table want as
select t1.name as original_name, t2.name as matched_name
from sashelp.class as t1
cross join sashelp.class as t2
where t1.name =* t2.name;
quit;

%*COMPGED comparison;
proc sql;
create table want2 as
select t1.name as original_name, t2.name as matched_name, compged(t1.name, t2.name) as distance
from sashelp.class as t1
cross join sashelp.class as t2
order by t1.name, distance ;
quit;
Reeza
Super User
Look up the Link King - it’s a free linkage tool based in SAS. The code is available somewhere but you need to dig a bit.
WM
Calcite | Level 5 WM
Calcite | Level 5

To my knowledge this is not an approved download for company I work for. Will need to stick with using Compged.

Reeza
Super User

The SAS program is a text file available somewhere assuming you're referring to my reference to The Link King. 

 

 

MINX
Obsidian | Level 7

Log said variable 'last name first name' does not exist. Check it in dataset.

WM
Calcite | Level 5 WM
Calcite | Level 5

OK thanks. Fixed that reference.

sas-innovate-white.png

Our biggest data and AI event of the year.

Don’t miss the livestream kicking off May 7. It’s free. It’s easy. And it’s the best seat in the house.

Join us virtually with our complimentary SAS Innovate Digital Pass. Watch live or on-demand in multiple languages, with translations available to help you get the most out of every session.

 

Register now!

Creating Custom Steps in SAS Studio

Check out this tutorial series to learn how to build your own steps in SAS Studio.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 12 replies
  • 3795 views
  • 0 likes
  • 4 in conversation