01-09-2018 01:51 AM
I want two merge two databases:1) KLD, 2)Compustat
They have two common variables: 1)Year and 2)Company Name.
Because my data are panel data I have to use both common variables to merge.
However, company name is not exactly similar in two databases (they may different in terms of capital letters, abbreviations, spelling errors, etc.)
Could you please let me know the coding of how merging these two databases.
I am not familiar with SAS, but I have found out SAS can solve this problem.
If somebody give me the whole codes for this purpose, I would be very thankful.
01-09-2018 04:16 AM
You will have to fix all those defects to merge the datasets. Depending on the modules you have licensed, this is more or less painful.
Sorry, but data-cleaning is not the most joyful task one can have.
01-09-2018 04:35 AM
Your first step will be to get to know your data and identify the data quality problems in detail. From there you can develop the necessary steps to rectify them.
Run proc contents first to see if you have discrepancies in terms of variable types and variable sizes.
Then run proc sort with nodupkey on your datasets (use the out= option to create new, intermediate datasets) and then compare the results, and look if you have multiple entries for the same business object.
Now, if you have SAS Data Quality licensed, this tool can do most of this work for you.
01-17-2018 11:27 PM
Thank you so Much.
Yes, SAS Data Quality Server is the best option for me to solve this problem.
Actually there are two ways to overcome this problem 1)Scoring method, 2) Fuzzy matching using SAS Data Quality Server. I have attached a pdf that explains these two ways very well. It was very helpful for me.
01-09-2018 10:12 AM
KLD and Compustat also have other common variables: CIK (Central Index Key, issued by the SEC), and stock ticker. Now stock ticker for a given company can change over time, so you would have to be aware of synchronization issues. And the database vendors may not have fully populated CIK. But I'd suggest getting those variables for matching purposes.