BookmarkSubscribeRSS Feed
SeanZ
Obsidian | Level 7

Hi,

I have two data sets. I want to merge them with company names. But there are some variations in the company names. For example, mgmt vs. management, company vs. co, limited vs. ltd, and vs. &, etc. Also, there might be extra blanks in the names. Can someone tell me how to deal with this issue? And some reference I can use to do this merge? THanks.

3 REPLIES 3
DR_Majeti
Quartz | Level 8

Hi SeanZ,

there will be a supporting variable for company name in numeric (i think) if it is in this cases.  check for that variable which will have some code for each company;

if there no such thing just make it through hard coding like,  

if companyname='management' then companycode=1

in both datasets then you can merge them by this new variable created.

--

Durga.

Scott_Mitchell
Quartz | Level 8

I would suggest you undertake some analysis to identify and replace strings such as 'co' with 'company' and the like. SAS(R) 9.2 Language Reference: Dictionary, Fourth Edition may help you replace these.

As far as merging by name is concerned you could sing a fuzzy merge using COMPGED http://www.nesug.org/proceedings/nesug07/ap/ap23.pdf.  This will require some analysis and feel for your data, but I have has success with it in the past.

Mit
Calcite | Level 5 Mit
Calcite | Level 5

You need to clean the data. But that is not easy. So there are two ways.

1. sort and summarise the data by the company name. So you should be able to see the differences and then use this result to clean the company names using tranwrd function applying to the main data.

company=tranwrd(compay,'mgmt','Management');

company=tranwrd(compay,'Mgmt','Management');

company=tranwrd(compay,'co','Comapny');

company=tranwrd(compay,'Co','Comapny');

You can use all of the above in the same data step.

after the cleaning again summarise and check and clean. You might have to summarise few times until the entire data is cleaned.

2. Depending on you data, you can also follow the above procedure but before doing that you can split the company name in two or three parts using the scan function and then summarise on those words. This way you should be able to find the different spellings and clean them.

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 3 replies
  • 588 views
  • 8 likes
  • 4 in conversation