Matching Company Names

Accepted Solution Solved
Reply
Contributor
Posts: 48
Accepted Solution

Matching Company Names

Hi,

 

I have two data sources of information that I want to merge together. The common matchkey across both sources is the Client Name. However, there are different variations of spellings for the same Client. Please see below a small example of the two datasets I have and what I would like it to be transformed to;

 

 

Have1  Have2 
     
Client_NameSector Client_NameSub_Sector
3i Group PlcGlobal Financial Investors 3iPrivate Equity
3m CompanyIndustrials 3M Company (NYSE:MMM)Manufacturing industries
Ag Barr PlcConsumer and Healthcare Group A.P. Møller - Maersk A/S (CPSE:MAERSK B)Transport
MaerskEnergy, Resources and Infrastructure AB Volvo (publ) (OM:VOLV B)Automotive
Aa PlcIndustrials AbertisInfrastructure funds
ElectroluxConsumer and Healthcare Group BTTelecoms
VolvoIndustrials 3M Holding CompanyManufacturing industries
Abb LtdIndustrials AB Electrolux (publ) (OM:ELUX B)Consumer
Abbott LaboratoriesConsumer and Healthcare Group RBSBanks
Abbvie IncConsumer and Healthcare Group ABBManufacturing industries
Aberdeen Asset Management PlcFinancial Institutions Group AAM PLC (LSE:ADN)Asset managers
Aberforth Smaller CompaniesFinancial Institutions Group AbbVie IncorpHealth
Abertis Infraestructuras SaGlobal Financial Investors Abbott Laboratories (NYSE:ABT)Health
Abn Amro Group NvFinancial Institutions Group Abertis Infraestructuras, S.A. (BME:ABE)Infrastructure funds
Britsh TelecommunicationsTechnology, Media and Telecoms AG BARConsumer
Royal Bank of Scotland PLCFinancial Institutions Group AABusiness Services
The Coca Cola CompanyConsumer and Healthcare Group Coca colaConsumer

 

And the final output should look like;

 

Want  
   
Client NameSectorSub_Sector
3i Group PlcGlobal Financial InvestorsPrivate Equity
3m CompanyIndustrialsManufacturing industries
Ag Barr PlcConsumer and Healthcare GroupConsumer
MaerskEnergy, Resources and InfrastructureTransport
Aa PlcIndustrialsBusiness Services
ElectroluxConsumer and Healthcare GroupConsumer
VolvoIndustrials"Unknown"
Abb LtdIndustrialsManufacturing industries
Abbott LaboratoriesConsumer and Healthcare GroupHealth
Abbvie IncConsumer and Healthcare GroupHealth
Aberdeen Asset Management PlcFinancial Institutions GroupAsset managers
Aberforth Smaller CompaniesFinancial Institutions GroupInfrastructure funds
Abertis Infraestructuras SaGlobal Financial InvestorsInfrastructure funds
Abn Amro Group NvFinancial Institutions Group"Unknown"
Britsh TelecommunicationsTechnology, Media and TelecomsTelecoms
Royal Bank of Scotland PLCFinancial Institutions GroupBanks
The Coca Cola CompanyConsumer and Healthcare Group

Consumer

 

Is there a reliable way for me to fuzzy match across the different client names? Any help would be greatly appreciated.

 

Many thanks

 

Chris


Accepted Solutions
Solution
‎03-22-2018 10:53 AM
PROC Star
Posts: 2,375

Re: Matching Company Names

@Reeza I would add one step to reduce the size of the cartesian product

 

6. Do fuzzy matching using COMPGED/COMPLEV/SOUNDS LIKE/SPEDIS and equality of first letter

7. Do fuzzy matching using COMPGED/COMPLEV/SOUNDS LIKE

 

COMPGED works best in my experience

 

View solution in original post


All Replies
Super User
Posts: 5,890

Re: Matching Company Names

How reliable do you need to be?

The most managed way to this is to use Data Management Studio.

Data never sleeps
Contributor
Posts: 48

Re: Matching Company Names

It doesn't have to be perfect to be honest. 70% plus maybe. Just need a good way to fuzzy match the example data set then take it from there.

Super User
Posts: 5,890

Re: Matching Company Names

70% what kind of analysis can you do with that little accuracy?
Given your examples (and my limited knowledge in fuzzy logic) I think together can apply some basic rules like matching on uppercase first two words (normalized), or create an extensive mapping table (or a combination of both).
Or don't reinvent the well a pay for some capable SW.
Data never sleeps
Super User
Posts: 23,791

Re: Matching Company Names

This is a non-trivial problem because you end up doing N*M calculations which can be quite big. 

Some tips:

1. Upcase everything

2. Change all & or other symbols such as @ from text

3. Ensure Co and/or LTD are standardized, i.e. LTD vs LTD. vs Limited vs Limited

4. Do an exact match and remove those records

5. Remove all spaces and try a second round of exact matches

6. Do fuzzy matching using COMPGED/COMPLEV/SOUNDS LIKE

 

 

Solution
‎03-22-2018 10:53 AM
PROC Star
Posts: 2,375

Re: Matching Company Names

@Reeza I would add one step to reduce the size of the cartesian product

 

6. Do fuzzy matching using COMPGED/COMPLEV/SOUNDS LIKE/SPEDIS and equality of first letter

7. Do fuzzy matching using COMPGED/COMPLEV/SOUNDS LIKE

 

COMPGED works best in my experience

 

Contributor
Posts: 48

Re: Matching Company Names

Thanks for the information. I have a list of headline titles containing company names in and company lookup table. Please see below;

 

 

Have 1Have 2
Headline_TitleClient_NameFF_Sector
3I in takeover talks3i Group PlcGlobal Financial Investors
AP Moller Maersk to invest in AI technologyMaerskEnergy, Resources and Infrastructure
to form partnership with EE BT GroupBT Group PLCPower, Utilities & Renewables
in fines due to miss selling EDF EDFPower, Utilities & Renewables
Abbott labs to sell off Africa armAbbot LaboratoriesConsumer and Healthcare Group
Bank RBS post profitRBSBanks

 

I want to extract out the list of company names from the text and match them to a company name lookup table that I have with cleansed company names. I want the table to look like this;

 

Want
Headline_TitleClient_NameFF_Sector
3I in takeover talks3i Group PlcGlobal Financial Investors
AP Moller Maersk to invest in AI technologyMaerskEnergy, Resources and Infrastructure
to form partnership with EE BT GroupBT Group PLCPower, Utilities & Renewables
in fines due to miss selling EDF EDFPower, Utilities & Renewables
Abbott labs to sell off Africa armAbbot LaboratoriesConsumer and Healthcare Group
Bank RBS post profitRBSBanks

 

I have thousands of records to match to and this is just an example.

 

Would you happen to know what the best way is to do this?

 

Thanks

Chris

PROC Star
Posts: 2,375

Re: Matching Company Names

Answers were already given. What is unclear in all the explanations above? 

☑ This topic is solved.

Need further help from the community? Please ask a new question.

Discussion stats
  • 7 replies
  • 413 views
  • 2 likes
  • 4 in conversation