BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
mig7126
Obsidian | Level 7

Hello Everyone,

 

I am writing a SAS program where I get data from Compustat and CRSP in order to do an event study.  I merged my CRSP stock return data with my CRSP market index return data with no issues.  However, I am having issues merging my Compustat data with my CRSP data - for some reason, the match merge is not working, and it does not give me any error messages.  Can someone please help me?

 

Here is my SAS program - it uses remote connection to WRDS.  The issue is with the match merge at the end - in red font color.

 

/*Remote Sign-on to WRDS Server*/
%let wrds = wrds.wharton.upenn.edu 4016;
options comamid=TCP remote=WRDS;
signon username=_prompt_;
 
rsubmit;
*Tell SAS where to look for data on WRDS;
*Define directory for Compustat North America - Monthly Update;
 
libname compm '/wrds/comp/sasdata/nam';
 
data tem;
set compm.fundq;
 if indfmt="INDL";
 if consol="C";
 if popsrc="D";
 if datafmt="STD";
 * Get Compustat data for the Dow Jones Industrial Avg. Components for 2000 to 2015;
 where 2000<=year(datadate)<=2015 and tic in ('AXP', 'AAPL', 'BA', 'CAT', 'CVX', 'CSCO',
 'KO', 'DIS', 'DD', 'XOM', 'GE', 'GS', 'HD', 'IBM', 'INTC', 'JNJ', 'JPM', 'MCD', 'MMM',
 'MRK', 'MSFT', 'NKE', 'PFE', 'PG', 'TRV', 'UNH', 'UTX', 'VZ', 'V', 'WMT');
 Format cusip $8.;
 Keep cusip tic conm datadate datafqtr rdq epsfxq;
Run;
* Download the dataset from WRDS to local library;
Proc sort data=tem;
 By cusip;
Proc download data=tem out=cmpst_raw;
Run;
endrsubmit;
 
rsubmit;
*Define directory for CRSP - Monthly Update;
libname daily '/wrds/crsp/sasdata/a_stock';
Data tem2;
 Set daily.dsf (keep = cusip date ret);
 * Get CRSP Daily stock return data for the Dow Jones Industrial Avg. components from 2000 to 2015. CUSIP
 is used since CRSP does not have ticker; 
 where 2000<=year(date)<=2015 and cusip in ('02581610','88579Y10','03783310','09702310','14912310',
 '16676410','17275R10','89417000','38141G10','92826C83','59491810','93114210','91301710','91324P10',
 '74271810','71708110','65410610','58933Y10','58013510','47816010','45920010','45814010','43707610');
 format date yymmddn8.;
Run;
Proc sort data = tem2;
 By date cusip;
 
* Download the dataset from WRDS to local library;
Proc download data=tem2 out=crsp_stk;
Run;
endrsubmit;
 
rsubmit;
libname ind '/wrds/crsp/sasdata/a_indexes';
* Get daily market index data for January 2000 to December 31, 2015;
Data tem3;
 set ind.dsic (keep=caldt vwretd ewretd);
 Where '03Jan2000'd<=caldt<='31Dec2015'd;
 Format date yymmddn8.;
 Rename caldt=date;
Run;
 
Proc sort data = tem3;
 By date;
Proc download data=tem3 out=crsp_indx;
Run;
endrsubmit;
 
Data crsp_stk_indx;
 merge crsp_stk crsp_indx;
 By date;
Run;
Data crsp_stk_indx;
 set crsp_stk_indx;
 where cusip ne " " and vwretd ne .; * Remove records that have no CUSIP or no Mkt return data;
Run;
 
* Modify Compustat quarterly earnings file to calculate unexpected earnings based on previous earnings;
Data cmpst_raw2;
 Set cmpst_raw;
 By cusip;
 If epsfxq and rdq; * Keep records where earnings and report date for quarterly earnings are not missing;
 ueps = epsfxq - lag(epsfxq); * UEPS is defined as the current earnings less prior qtr earnings;
 If ueps < 0 then evntdum=1;
 Else if ueps > 0 then evntdum=2;
 Else evntdum = 0;
 if first.cusip then ueps = 0;
 If ueps; * Keep records where UEPS does not equal to zero;
 keep cusip rdq epsfxq ueps evntdum; * Keep only CUSIP, Report date for quarterly earnings, earnings and UEPS;
Run;
* Create dates 90 days prior to event date (rdq) and 10 days after event date (rdq);
data cmpst_int;
 Format date yymmddn8.;
 set cmpst_raw2;
 do date = rdq - 90 to rdq + 10;
    output;
 end;
keep cusip date rdq evntdum;
run;
 
proc sort data=cmpst_int;
 by cusip date;
run;
 
Data crsp_stk_indx;
 Set crsp_stk_indx (keep = cusip date ret vwretd evntdum);
Proc sort data = crsp_stk_indx;
 By cusip date;
Run;
Data crsp_stk_indx;
 Retain date cusip ret vwretd;
 set crsp_stk_indx;
Run;
Data crsp_cmpst;
 Merge cmpst_int (in=a)
 crsp_stk_indx (in=b);
 By cusip date;
 If a and b;
Run;
 
Thanks!

 

1 ACCEPTED SOLUTION

Accepted Solutions
Reeza
Super User
1. Run a proc contents on the two data sets. Look for differences in the two variables
2. Strip formats and run a proc freq on the two variables in each dataset.

If you can, post results here for help diagnosing issue.

View solution in original post

10 REPLIES 10
LinusH
Tourmaline | Level 20
"Not working" meaning what? No matches?
As with all this type of trouble shooting you need to carefully check your input data to the specific step.

What do you suspect to happen with the RETAIN in last crsp_stk_indx step?
Data never sleeps
mig7126
Obsidian | Level 7

Thanks for your quick reply.  When I mean not working is that I get no matches.  The reason that I added the RETAIN in the last crsp_stk_indx step was to re-order the columns in the data table so that Date column would appear before the CUSIP column, which is the same way as the cmpst_int data table.  However, if I remove these re-ordering steps, I still do not get any matches.  I troubleshooted the program last night for quite a few hours and cannot figure out why there are no matches. 

 

I appreciate your help to see if you have any other thoughts on why I don't have any matches.

 

Thanks!

mig7126
Obsidian | Level 7

Yes, no matches.  I took out the RETAIN steps, and I still do not get any matches.  I troubleshooted the program extensively and I cannot figure it out.  Can you please let me know your thoughts?

data_null__
Jade | Level 19

Reordering should not cause the problem at least not the code you used.

 

Clearly there are no combinations of CUSPID DATE values that match (call me Mr. Obvious) 

  • Do you have messages about multiple lengths specified for BY variables?
  • Are the dates actually the same?  If you imported the data from XLS it is possible that a datetime field in EXCEL was imported as a SAS DATE with a decimal part (representing time).  Take INT(DATE) of both and see if that fixes it.
  • Use $HEX. format to look at the values of CUSID from each data set.  This could reveal leading spaces that you do not notice or other characters that appear as spaces.
  • In data TEM you have  Format cusip $8.; which does not change the value but does change how it is shown to you. Or, what you see might not be what you got.

If it ain't the program it must be the data.

PGStats
Opal | Level 21

Excellent suggestions @data_null__. I would suggest using ROUND(DATE) instead of INT(DATE) to fix the Excel date problem as I have seen Excel (or MS-Access) dates being imported with values such as 12345.999998 

PG
Reeza
Super User
1. Run a proc contents on the two data sets. Look for differences in the two variables
2. Strip formats and run a proc freq on the two variables in each dataset.

If you can, post results here for help diagnosing issue.
mig7126
Obsidian | Level 7

Thanks everyone for your suggestions.  One key point:  The data is not coming from Excel; if you look at the SAS code, I am connecting remotely to WRDS to get Compustat and CRSP data directly, and then bringing it into SAS.  Therefore, this is not the issue.

 

I will try the other suggestions, specifically the PROC CONTENTS and PROC FREQ to see if there are any issues with the data formats, and wil let you know.

 

Thanks!

 

mig7126
Obsidian | Level 7

Hi Everyone,

 

Thanks again for your great suggestions. I ran PROC CONTENTS and uncovered that CUSIP in CRSP_STK_INDX data table was not the same format as CUSIP in CMPST_INT data table.  Once I made the format change, the match merge now works!

 

Thank you again for your great help!

 

 

 

data_null__
Jade | Level 19

MERGE does not use formatted value of a BY variable to do matching (unless the GROUPFORMAT option is used on the BY statement).  What did you actually change?

 

 

mig7126
Obsidian | Level 7

Hi data_null_,

 

I formatted the variable CUSIP in the CMPST_INT data table in a Data step, not in the MERGE statement:

 

Data CMPST_INT;

   Format CUSIP $8. Date yymmddn8.;

 

 

hackathon24-white-horiz.png

The 2025 SAS Hackathon has begun!

It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.

Latest Updates

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 10 replies
  • 4506 views
  • 1 like
  • 5 in conversation