There is no commonality between your two files.
How is this supposed to help us help you?
More to the point, have you tried fuzzy matching using the recommended steps above?
Some tips:
1. Upcase everything
2. Change all & or other symbols such as @ from text
3. Ensure Co and/or LTD are standardized, i.e. LTD vs LTD. vs Limited vs Limited => This could be replaced to deal with middle names and/or initials, in the case of names
4. Do an exact match and remove those records
5. Remove all spaces and try a second round of exact matches
6. Do fuzzy matching using COMPGED/COMPLEV/SOUNDS LIKE/SPEDIS and equality of first letter
7. Do fuzzy matching using COMPGED/COMPLEV/SOUNDS LIKE
COMPGED works best in my experience
as well as @cmoore's cleaning steps (extract, upcase, compress, justify, compl) to implement point 2 above?
Cleaned_directorname = scan(directorname, 1,'()');
Cleaned_directorname = upcase(Cleaned_directorname);
Cleaned_directorname = compress(Cleaned_directorname,",,.,/,*,',","");
Cleaned_directorname = compress(Cleaned_directorname,"""");
Cleaned_directorname = strip(Cleaned_directorname);
Cleaned_directorname = compbl(Cleaned_directorname);
[Edited formatting]
Hi,
The kind of coding you could you use to fuzzy match on name can be seen below. You just need to amend your variables to fit your data tables;
/*Non-Exact Matches - Use the Algorithm to match variables - on Company Names*/
proc sql;
create table NonExact_matches_Comp as
select s1.company_name as Original_Company_Name,
s1.cleaned_company_name as Cleaned_Company_name,
s2.cleaned_company_name as connect_company_name,
s2.cleaned_group_name as connect_group_name,
s2.client_group_number,
compged(s1.cleaned_company_name,s2.cleaned_company_name,999,'LN')as compged,
spedis(s1.cleaned_company_name,s2.cleaned_company_name)as spedis
from s1
inner join
s2
on (compged(s1.cleaned_company_name,s2.cleaned_company_name,999,'LN') < 200
and spedis(s1.cleaned_company_name,s2.cleaned_company_name) < 50)
group by 1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17
having calculated compged = min(compged(s1.cleaned_company_name,s2.cleaned_company_name,999,'LN'))
;
quit;
@cmoore This should be a multi-step process, going from best to worst match quality, as outlined above.
This ensures faster processing as well as better understanding of the quality of the results.
@ChrisNZ yes total agree with your logic and those are the steps that I used to write my matching programme. The code above is just a snippet of the code that I used at the very end of the process where I needed to use fuzzy logic on poor data that was left after data cleansing and removing exact matches.
Hello France,
This the final code what i used to do the matching. Just do the necessary changes.It worked for me. Hope it helps. Regard EJAA.
data s1;
set s1;
format Cleaned_exec_fullname $255.;
/*Name*/
Cleaned_exec_fullname = scan(exec_fullname, 1,'()');
Cleaned_exec_fullname = upcase(Cleaned_exec_fullname);
Cleaned_exec_fullname = compress(Cleaned_exec_fullname,",,.,/,*,',","");
Cleaned_exec_fullname = compress(Cleaned_exec_fullname,"""");
Cleaned_exec_fullname = strip(Cleaned_exec_fullname);
Cleaned_exec_fullname = compbl(Cleaned_exec_fullname);
run;
data s2;
set s2;
format Cleaned_directorname $255.;
/* Name*/
Cleaned_directorname = scan(directorname, 1,'()');
Cleaned_directorname = upcase(Cleaned_directorname);
Cleaned_directorname = compress(Cleaned_directorname,",,.,/,*,',","");
Cleaned_directorname = compress(Cleaned_directorname,"""");
Cleaned_directorname = strip(Cleaned_directorname);
Cleaned_directorname = compbl(Cleaned_directorname);
run;
proc sql noprint;
**Selecting exact matches;
create table Exact_matches_Comp as
select s1.exec_fullname as Original_Exec_Name,
s1.Cleaned_exec_fullname as Cleaned_Exec_Name,
s2.directorname as Original_Director_Name,
s2.Cleaned_directorname as Cleaned_Director_Name,
s2.degree
from s1 inner join s2
on s1.Cleaned_exec_fullname = s2.Cleaned_directorname
where s1.Cleaned_exec_fullname is not null;
quit;
proc sort data=Exact_matches_Comp nodupkey; by Cleaned_Director_Name degree; run;
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.