SAS Data Integration Studio, DataFlux Data Management Studio, SAS/ACCESS, SAS Data Loader for Hadoop and others

Merging two databases with two common variables, one of common variables have inexact characters

Reply
Learner FA1
Learner
Posts: 1

Merging two databases with two common variables, one of common variables have inexact characters

Hello,

I want two merge two databases:1) KLD, 2)Compustat

They have two common variables: 1)Year and 2)Company Name. 

Because my data are panel data I have to use both common variables to merge. 

However, company name is not exactly similar in two databases (they may different in terms of  capital letters, abbreviations, spelling errors, etc.)

Could you please let me know the coding of how merging these two databases.

I am not familiar with SAS, but I have found out SAS can solve this problem. 

If somebody give me the whole codes for this purpose, I would be very thankful.

 

Kind regards,

FA1 

 

Super Contributor
Posts: 367

Re: Merging two databases with two common variables, one of common variables have inexact characters

You will have to fix all those defects to merge the datasets. Depending on the modules you have licensed, this is more or less painful.

  1. Change company name to all upcase chars.
  2. Replace abbreviations
  3. Find and fix spelling errors.

Sorry, but data-cleaning is not the most joyful task one can have.

Super User
Posts: 8,302

Re: Merging two databases with two common variables, one of common variables have inexact characters

Your first step will be to get to know your data and identify the data quality problems in detail. From there you can develop the necessary steps to rectify them.

Run proc contents first to see if you have discrepancies in terms of variable types and variable sizes.

Then run proc sort with nodupkey on your datasets (use the out= option to create new, intermediate datasets) and then compare the results, and look if you have multiple entries for the same business object.

 

Now, if you have SAS Data Quality licensed, this tool can do most of this work for you.

---------------------------------------------------------------------------------------------
Maxims of Maximally Efficient SAS Programmers
Trusted Advisor
Posts: 1,117

Re: Merging two databases with two common variables, one of common variables have inexact characters

KLD and Compustat also have other common variables:  CIK (Central Index Key, issued by the SEC), and stock ticker.  Now stock ticker for a given company can change over time, so you would have to be aware of synchronization issues.  And the database vendors may not have fully populated CIK. But I'd suggest getting those variables for matching purposes.

Ask a Question
Discussion stats
  • 3 replies
  • 89 views
  • 0 likes
  • 4 in conversation