BookmarkSubscribeRSS Feed
SeanZ
Obsidian | Level 7

Hi,

I have two data sets. I want to merge them with company names. But there are some variations in the company names. For example, mgmt vs. management, company vs. co, limited vs. ltd, and vs. &, etc. Also, there might be extra blanks in the names. Can someone tell me how to deal with this issue? And some reference I can use to do this merge? THanks.

3 REPLIES 3
DR_Majeti
Quartz | Level 8

Hi SeanZ,

there will be a supporting variable for company name in numeric (i think) if it is in this cases.  check for that variable which will have some code for each company;

if there no such thing just make it through hard coding like,  

if companyname='management' then companycode=1

in both datasets then you can merge them by this new variable created.

--

Durga.

Scott_Mitchell
Quartz | Level 8

I would suggest you undertake some analysis to identify and replace strings such as 'co' with 'company' and the like. SAS(R) 9.2 Language Reference: Dictionary, Fourth Edition may help you replace these.

As far as merging by name is concerned you could sing a fuzzy merge using COMPGED http://www.nesug.org/proceedings/nesug07/ap/ap23.pdf.  This will require some analysis and feel for your data, but I have has success with it in the past.

Mit
Calcite | Level 5 Mit
Calcite | Level 5

You need to clean the data. But that is not easy. So there are two ways.

1. sort and summarise the data by the company name. So you should be able to see the differences and then use this result to clean the company names using tranwrd function applying to the main data.

company=tranwrd(compay,'mgmt','Management');

company=tranwrd(compay,'Mgmt','Management');

company=tranwrd(compay,'co','Comapny');

company=tranwrd(compay,'Co','Comapny');

You can use all of the above in the same data step.

after the cleaning again summarise and check and clean. You might have to summarise few times until the entire data is cleaned.

2. Depending on you data, you can also follow the above procedure but before doing that you can split the company name in two or three parts using the scan function and then summarise on those words. This way you should be able to find the different spellings and clean them.

sas-innovate-2024.png

Available on demand!

Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 3 replies
  • 632 views
  • 8 likes
  • 4 in conversation