- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
I have 2 excel worksheets that I've used excel add-in for fuzzy lookup. I have imported both into SAS Enterprise Guide hoping I could figure out a similar approach. I've read about CompGed but not sure it it's what I want to use and if so then getting it coded correctly for what I want it to do.
The information I want to compare is named slightly different between the worksheets(Insured Client Name vs Last Name First Name). I want to identify which names are closely matched. One thing to note is that the Last Name First Name is a concatenated value performed within the excel worksheet before I imported.
I'm looking for suggestions on best way to approach identifying possible matches and then idea of sample code to use. I copied some code I found within this community and inserted what I thought was correct but can't get it to work. It runs but doesn't return any output data. The code I'm using is;
data Insured_Name_Match;
set work.'copy of all agencies 2017 sales'n ;
tmp1=soundex('Insured Client Name'n);
do i=1 to nobs;
set WORK.'query_for_dec2017 online illustr'n point=i nobs=nobs;
tmp2=soundex('last name first name'n);
dif=compged(tmp1,tmp2);
if dif<=100 then do;
possible_match='Yes';
drop i tmp1 tmp2;
output;
end;
end;
run;
I don't get any error code. here's what the project log says. Here's the error code I get;
22 GOPTIONS ACCESSIBLE;
23 data Insured_Name_Match;
24 set work.'copy of all agencies 2017 sales'n ;
25 tmp1=soundex('Insured Client Name'n);
26 do i=1 to nobs;
27 set WORK.'query_for_dec2017 online illustr'n point=i nobs=nobs;
28 tmp2=soundex('last name first name'n);
29 dif=compged(tmp1,tmp2);
30 if dif<=100 then do;
31 possible_match='Yes';
32 drop i tmp1 tmp2;
33 output;
34 end;
35 end;
36 run;
NOTE: Numeric values have been converted to character values at the places given by: (Line):(Column).
28:14
NOTE: Variable 'last name first name'n is uninitialized.
NOTE: There were 56521 observations read from the data set WORK.'COPY OF ALL AGENCIES 2017 SALES'n.
NOTE: The data set WORK.INSURED_NAME_MATCH has 0 observations and 28 variables.
NOTE: DATA statement used (Total process time):
real time 43.05 seconds
cpu time 41.82 seconds
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
In SAS, you could use the "sounds like" operator
Paige Miller
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Thanks for your suggestion. I am comparing a list of approx. 5000 names to another list of over 20,000, so don't think 'sounds like' can be used for my situation.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
@WM wrote:
Thanks for your suggestion. I am comparing a list of approx. 5000 names to another list of over 20,000, so don't think 'sounds like' can be used for my situation.
Did you try it? Did it not work? Why are you assuming it can't work?
5000 * 20000 is 100,000,000 comparisons though so any approach is going to take time.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Looking at code it appears I need to include actual name in sounds like code for each name I'm looking for? Am I missing something that would tell it to repeat comparison for each name on the list?
Program to Select Names That Sound like 'Johnson'
libname proclib 'SAS-library';
proc sql; title "Employees Whose Last Name Sounds Like 'Johnson'"; select idnum, upcase(lname), fname from proclib.staff
where lname=*"Johnson" order by 2;
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
The other side of the comparison is another variable from your other table.
where lname =* other_variable_name
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Here's code I'm using. I'm just learning so am not following how to identify\select name variables from two different data sets. Does this look correct?
proc sql;
Select 'Insured Client Name'n from work.'copy of all agencies 2017 sales'n;
select 'last_name_first_name'n from work.'query_for_dec2017 online illustr'n;
where 'Insured Client Name'n =* 'last_name_first_name'n;
run;
The code runs but only returns 1 column for Insured Client Name. Also the log summary shows this;
22 GOPTIONS ACCESSIBLE;
23 proc sql;
24 Select 'Insured Client Name'n from work.'copy of all agencies 2017 sales'n;
25 select 'last_name_first_name'n from work.'query_for_dec2017 online illustr'n;
26 where 'Insured Client Name'n =* 'last_name_first_name'n;
WARNING: This SAS global statement is not supported in PROC SQL. It has been ignored.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
No, you're not creating a table for starters and I doubt you want to work with the data from the ODS listing.
You also need a join of some kind, even if its on itself.
%*sounds like comparison;proc sql;
create table want as
select t1.name as original_name, t2.name as matched_name
from sashelp.class as t1
cross join sashelp.class as t2
where t1.name =* t2.name;
quit;
%*COMPGED comparison;
proc sql;
create table want2 as
select t1.name as original_name, t2.name as matched_name, compged(t1.name, t2.name) as distance
from sashelp.class as t1
cross join sashelp.class as t2
order by t1.name, distance ;
quit;
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
To my knowledge this is not an approved download for company I work for. Will need to stick with using Compged.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
The SAS program is a text file available somewhere assuming you're referring to my reference to The Link King.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Log said variable 'last name first name' does not exist. Check it in dataset.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
OK thanks. Fixed that reference.