Hello everyone,
I have an excel that I imported into SAS. The excel file contains information on companies that I need to separate into columns and then I want to match the observation on those company names that I have in another data file.
There are at least 2 company names in each cell and can be up to 19 company names. Ex;
cname
row1 Microsoft Inc
Google Inc
row 2 General electric
General motors
Row 3 General motors
Ford
General electric
What I want is ---->
cname1 cname2 cname3
row1 Microsoft Inc Google Inc
row2 General electric General Motors
row3 General motors Ford General electric
AND THEN what I want to do is match the observation on company names stored in another file. However, the company names in the other file wont always be exact matches per se.....
e.g., Microsoft Inc = Microsoft or Google inc = Google or Ford = Ford motors or Motorola Inc = motorola communications or du pont = ei dupont de nemours
Ive attached some screenshots so that you can get a sense of what Im working with..Thanks for any help in advance!
It depends on the delimiter between these company name. Try take '0D0A'x as delimiter.
There are a couple of functions can do fuzzy matching in SAS.
Some like SPEDIS() , COMPLEV(),COMPGEN() ...........
Fuzzy matching isn't straightforward.
For example will you be matching on the various variables you've now split out or only the first one.
What happens if you have an exact match? How close is good enough in fuzzy matching for you? It's a bit of a trial and error process that gets customized to your particular situation. Do you want it iterative, first loop looks for exact, second uses soundex, third uses compged, etc?
There's a bunch of SAS papers on the topic available here:
http://www.lexjansen.com/search/searchresults.php?q=fuzzy%20matching
Thanks for your reply.
I will be matching on both of the variables eventually, but only one at a time.
I now have two datasets, each have a firm name and a year. And I need to merge them on those two variables because I want to merge in the rest of the information that each observation contains. They are firm names, so the names are similar, but not always exact.
Something like
Data set 1 Data set 2
Firm Firm
Google Inc Google
Ford Motor Ford
Motorola holdings Motorola solutions
Sometimes they will be an exact match, sometimes they will not be. I would like a reasonable error variance, but obviously not too much where it's creating too many unnecessary matches. And yes, ideally it would be iterative. Bear in mind that I am by no means an advanced programmer, but do use sas to get around (probably often using simpler code in more steps than someone else could do in one step). I usually use a command like this to merge
proc sql;
create table adtr.a1 as select a.*, b.address1, b.address2
from adtr.stage as a left join adtr.test as b
on a.code= b.code and a.year=b.year;
quit;
I would like to fuzzy match on firm name and also on year.
Thanks for your help!
@r4321 wrote:
Sometimes they will be an exact match, sometimes they will not be. I would like a reasonable error variance, but obviously not too much where it's creating too many unnecessary matches. And yes, ideally it would be iterative.
Thanks for your help!
Define "reasonable error variance"?
See the solution from Fried Egg here:
https://communities.sas.com/t5/SAS-Procedures/Name-matching/td-p/82780
Another alternative is to clean up your names/standardize them somehow so you can do an exact merge.
Fuzzy matching is very tough task. Here is an example :
data have;
input stores $32.;
cards;
ABC STORE
APPLE COMPUTERS
MICROSOFT CORPORATION
COSTCO STORE
WALMART
SAMS CLUB
;
run;
proc sql;
select a.*,b.stores as matched_stores
from have as a,have as b
group by a.stores
having spedis(a.stores,b.stores)=min(spedis(a.stores,b.stores));
quit;
How big are these two tables ?
Suggestion is spliting the first table adtr.stage into lots of small tables ,and make a macro go through all these sub-table,and remerge them all back together .
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Need to connect to databases in SAS Viya? SAS’ David Ghan shows you two methods – via SAS/ACCESS LIBNAME and SAS Data Connector SASLIBS – in this video.
Find more tutorials on the SAS Users YouTube channel.