BookmarkSubscribeRSS Feed
WM
Calcite | Level 5 WM
Calcite | Level 5

I have 2 excel worksheets that I've used excel add-in for fuzzy lookup. I have imported both into SAS Enterprise Guide hoping I could figure out a similar approach. I've read about CompGed but not sure it it's what I want to use and if so then getting it coded correctly for what I want it to do.

 

The information I want to compare is named slightly different between the worksheets(Insured Client Name vs  Last Name First Name). I want to  identify which names are closely matched. One thing to note is that the Last Name First Name is a concatenated value performed within the excel worksheet before I imported.

 

I'm looking for suggestions on best way to approach identifying possible matches and then idea of sample code to use. I copied some code I found within this community and inserted what I thought was correct but can't get it to work. It runs but doesn't return any output data.  The code I'm using is;

 

data Insured_Name_Match;

set work.'copy of all agencies 2017 sales'n ;

 

tmp1=soundex('Insured Client Name'n);

do i=1 to nobs;

set WORK.'query_for_dec2017 online illustr'n point=i nobs=nobs;

 

tmp2=soundex('last name first name'n);

dif=compged(tmp1,tmp2);

if dif<=100 then do;

 

possible_match='Yes';

drop i tmp1 tmp2;

output;

end;

end;

run;

 

 

I don't get any error code. here's what the project log says. Here's the error code I get;

 

22 GOPTIONS ACCESSIBLE;

23 data Insured_Name_Match;

24 set work.'copy of all agencies 2017 sales'n ;

25 tmp1=soundex('Insured Client Name'n);

26 do i=1 to nobs;

27 set WORK.'query_for_dec2017 online illustr'n point=i nobs=nobs;

28 tmp2=soundex('last name first name'n);

29 dif=compged(tmp1,tmp2);

30 if dif<=100 then do;

31 possible_match='Yes';

32 drop i tmp1 tmp2;

33 output;

34 end;

35 end;

36 run;

 

NOTE: Numeric values have been converted to character values at the places given by: (Line):(Column).

28:14

NOTE: Variable 'last name first name'n is uninitialized.

NOTE: There were 56521 observations read from the data set WORK.'COPY OF ALL AGENCIES 2017 SALES'n.

NOTE: The data set WORK.INSURED_NAME_MATCH has 0 observations and 28 variables.

NOTE: DATA statement used (Total process time):

real time 43.05 seconds

cpu time 41.82 seconds

 

 

12 REPLIES 12
WM
Calcite | Level 5 WM
Calcite | Level 5

Thanks for your suggestion. I am comparing a list of approx. 5000 names to another list of over 20,000, so don't think 'sounds like' can be used for my situation.

Reeza
Super User

@WM wrote:

Thanks for your suggestion. I am comparing a list of approx. 5000 names to another list of over 20,000, so don't think 'sounds like' can be used for my situation.


Did you try it? Did it not work? Why are you assuming it can't work?

 

5000 * 20000 is 100,000,000 comparisons though so any approach is going to take time. 

WM
Calcite | Level 5 WM
Calcite | Level 5

Looking at code it appears I need to include actual name in sounds like code for each name I'm looking for? Am I missing something that would tell it to repeat comparison for each name on the list?

 

Program to Select Names That Sound like 'Johnson'

libname proclib 'SAS-library';
proc sql;
   title "Employees Whose Last Name Sounds Like 'Johnson'";
   select idnum, upcase(lname), fname
      from proclib.staff
where lname=*"Johnson"
      order by 2;
Reeza
Super User

The other side of the comparison is another variable from your other table.

 

where lname =* other_variable_name
WM
Calcite | Level 5 WM
Calcite | Level 5

Here's code I'm using. I'm just learning so am not following how to identify\select name variables from two different data sets. Does this look correct?

 

proc sql;

Select 'Insured Client Name'n from work.'copy of all agencies 2017 sales'n;

select 'last_name_first_name'n from work.'query_for_dec2017 online illustr'n;

where 'Insured Client Name'n =* 'last_name_first_name'n;

run;

 

The code runs but only returns 1 column for Insured Client Name. Also the log summary shows this;

 

22 GOPTIONS ACCESSIBLE;

23 proc sql;

24 Select 'Insured Client Name'n from work.'copy of all agencies 2017 sales'n;

25 select 'last_name_first_name'n from work.'query_for_dec2017 online illustr'n;

26 where 'Insured Client Name'n =* 'last_name_first_name'n;

 

WARNING: This SAS global statement is not supported in PROC SQL. It has been ignored.

 

Reeza
Super User

No, you're not creating a table for starters and I doubt you want to work with the data from the ODS listing. 

You also need a join of some kind, even if its on itself.

 

%*sounds like comparison;
proc sql; create table want as select t1.name as original_name, t2.name as matched_name from sashelp.class as t1 cross join sashelp.class as t2 where t1.name =* t2.name; quit; %*COMPGED comparison; proc sql; create table want2 as select t1.name as original_name, t2.name as matched_name, compged(t1.name, t2.name) as distance from sashelp.class as t1 cross join sashelp.class as t2 order by t1.name, distance ; quit;
Reeza
Super User
Look up the Link King - it’s a free linkage tool based in SAS. The code is available somewhere but you need to dig a bit.
WM
Calcite | Level 5 WM
Calcite | Level 5

To my knowledge this is not an approved download for company I work for. Will need to stick with using Compged.

Reeza
Super User

The SAS program is a text file available somewhere assuming you're referring to my reference to The Link King. 

 

 

MINX
Obsidian | Level 7

Log said variable 'last name first name' does not exist. Check it in dataset.

WM
Calcite | Level 5 WM
Calcite | Level 5

OK thanks. Fixed that reference.

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

SAS Enterprise Guide vs. SAS Studio

What’s the difference between SAS Enterprise Guide and SAS Studio? How are they similar? Just ask SAS’ Danny Modlin.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 12 replies
  • 2112 views
  • 0 likes
  • 4 in conversation