BookmarkSubscribeRSS Feed
FA1
Calcite | Level 5 FA1
Calcite | Level 5

Hello,

I want two merge two databases:1) KLD, 2)Compustat

They have two common variables: 1)Year and 2)Company Name. 

Because my data are panel data I have to use both common variables to merge. 

However, company name is not exactly similar in two databases (they may different in terms of  capital letters, abbreviations, spelling errors, etc.)

Could you please let me know the coding of how merging these two databases.

I am not familiar with SAS, but I have found out SAS can solve this problem. 

If somebody give me the whole codes for this purpose, I would be very thankful.

 

Kind regards,

FA1 

 

4 REPLIES 4
andreas_lds
Jade | Level 19

You will have to fix all those defects to merge the datasets. Depending on the modules you have licensed, this is more or less painful.

  1. Change company name to all upcase chars.
  2. Replace abbreviations
  3. Find and fix spelling errors.

Sorry, but data-cleaning is not the most joyful task one can have.

Kurt_Bremser
Super User

Your first step will be to get to know your data and identify the data quality problems in detail. From there you can develop the necessary steps to rectify them.

Run proc contents first to see if you have discrepancies in terms of variable types and variable sizes.

Then run proc sort with nodupkey on your datasets (use the out= option to create new, intermediate datasets) and then compare the results, and look if you have multiple entries for the same business object.

 

Now, if you have SAS Data Quality licensed, this tool can do most of this work for you.

FA1
Calcite | Level 5 FA1
Calcite | Level 5

Dear KurBremser                    

Thank you so Much.

Yes, SAS Data Quality Server is the best option for me to solve this problem.

Actually there are two ways to overcome this problem 1)Scoring method, 2) Fuzzy matching using SAS Data Quality Server. I have attached a pdf that explains these two ways very well. It was very helpful for me. 

 

Kind regards,

FA1

mkeintz
PROC Star

KLD and Compustat also have other common variables:  CIK (Central Index Key, issued by the SEC), and stock ticker.  Now stock ticker for a given company can change over time, so you would have to be aware of synchronization issues.  And the database vendors may not have fully populated CIK. But I'd suggest getting those variables for matching purposes.

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

How to connect to databases in SAS Viya

Need to connect to databases in SAS Viya? SAS’ David Ghan shows you two methods – via SAS/ACCESS LIBNAME and SAS Data Connector SASLIBS – in this video.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 4 replies
  • 1560 views
  • 0 likes
  • 4 in conversation