- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Hello Everyone,
I am writing a SAS program where I get data from Compustat and CRSP in order to do an event study. I merged my CRSP stock return data with my CRSP market index return data with no issues. However, I am having issues merging my Compustat data with my CRSP data - for some reason, the match merge is not working, and it does not give me any error messages. Can someone please help me?
Here is my SAS program - it uses remote connection to WRDS. The issue is with the match merge at the end - in red font color.
%let wrds = wrds.wharton.upenn.edu 4016;
options comamid=TCP remote=WRDS;
signon username=_prompt_;
*Tell SAS where to look for data on WRDS;
*Define directory for Compustat North America - Monthly Update;
set compm.fundq;
if indfmt="INDL";
if consol="C";
if popsrc="D";
if datafmt="STD";
* Get Compustat data for the Dow Jones Industrial Avg. Components for 2000 to 2015;
where 2000<=year(datadate)<=2015 and tic in ('AXP', 'AAPL', 'BA', 'CAT', 'CVX', 'CSCO',
'KO', 'DIS', 'DD', 'XOM', 'GE', 'GS', 'HD', 'IBM', 'INTC', 'JNJ', 'JPM', 'MCD', 'MMM',
'MRK', 'MSFT', 'NKE', 'PFE', 'PG', 'TRV', 'UNH', 'UTX', 'VZ', 'V', 'WMT');
Format cusip $8.;
Keep cusip tic conm datadate datafqtr rdq epsfxq;
Run;
* Download the dataset from WRDS to local library;
Proc sort data=tem;
By cusip;
Proc download data=tem out=cmpst_raw;
Run;
endrsubmit;
*Define directory for CRSP - Monthly Update;
libname daily '/wrds/crsp/sasdata/a_stock';
Data tem2;
Set daily.dsf (keep = cusip date ret);
* Get CRSP Daily stock return data for the Dow Jones Industrial Avg. components from 2000 to 2015. CUSIP
is used since CRSP does not have ticker;
where 2000<=year(date)<=2015 and cusip in ('02581610','88579Y10','03783310','09702310','14912310',
'16676410','17275R10','89417000','38141G10','92826C83','59491810','93114210','91301710','91324P10',
'74271810','71708110','65410610','58933Y10','58013510','47816010','45920010','45814010','43707610');
format date yymmddn8.;
Run;
Proc sort data = tem2;
By date cusip;
Run;
endrsubmit;
libname ind '/wrds/crsp/sasdata/a_indexes';
* Get daily market index data for January 2000 to December 31, 2015;
Data tem3;
set ind.dsic (keep=caldt vwretd ewretd);
Where '03Jan2000'd<=caldt<='31Dec2015'd;
Format date yymmddn8.;
Rename caldt=date;
Run;
By date;
Proc download data=tem3 out=crsp_indx;
Run;
endrsubmit;
merge crsp_stk crsp_indx;
By date;
Run;
Data crsp_stk_indx;
set crsp_stk_indx;
where cusip ne " " and vwretd ne .; * Remove records that have no CUSIP or no Mkt return data;
Run;
Data cmpst_raw2;
Set cmpst_raw;
By cusip;
If epsfxq and rdq; * Keep records where earnings and report date for quarterly earnings are not missing;
ueps = epsfxq - lag(epsfxq); * UEPS is defined as the current earnings less prior qtr earnings;
If ueps < 0 then evntdum=1;
Else if ueps > 0 then evntdum=2;
Else evntdum = 0;
if first.cusip then ueps = 0;
If ueps; * Keep records where UEPS does not equal to zero;
keep cusip rdq epsfxq ueps evntdum; * Keep only CUSIP, Report date for quarterly earnings, earnings and UEPS;
Run;
Format date yymmddn8.;
set cmpst_raw2;
do date = rdq - 90 to rdq + 10;
output;
end;
keep cusip date rdq evntdum;
run;
by cusip date;
run;
Set crsp_stk_indx (keep = cusip date ret vwretd evntdum);
Proc sort data = crsp_stk_indx;
By cusip date;
Run;
Data crsp_stk_indx;
Retain date cusip ret vwretd;
set crsp_stk_indx;
Run;
Data crsp_cmpst;
Merge cmpst_int (in=a)
crsp_stk_indx (in=b);
By cusip date;
If a and b;
Run;
Accepted Solutions
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
2. Strip formats and run a proc freq on the two variables in each dataset.
If you can, post results here for help diagnosing issue.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
As with all this type of trouble shooting you need to carefully check your input data to the specific step.
What do you suspect to happen with the RETAIN in last crsp_stk_indx step?
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Thanks for your quick reply. When I mean not working is that I get no matches. The reason that I added the RETAIN in the last crsp_stk_indx step was to re-order the columns in the data table so that Date column would appear before the CUSIP column, which is the same way as the cmpst_int data table. However, if I remove these re-ordering steps, I still do not get any matches. I troubleshooted the program last night for quite a few hours and cannot figure out why there are no matches.
I appreciate your help to see if you have any other thoughts on why I don't have any matches.
Thanks!
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Yes, no matches. I took out the RETAIN steps, and I still do not get any matches. I troubleshooted the program extensively and I cannot figure it out. Can you please let me know your thoughts?
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Reordering should not cause the problem at least not the code you used.
Clearly there are no combinations of CUSPID DATE values that match (call me Mr. Obvious)
- Do you have messages about multiple lengths specified for BY variables?
- Are the dates actually the same? If you imported the data from XLS it is possible that a datetime field in EXCEL was imported as a SAS DATE with a decimal part (representing time). Take INT(DATE) of both and see if that fixes it.
- Use $HEX. format to look at the values of CUSID from each data set. This could reveal leading spaces that you do not notice or other characters that appear as spaces.
- In data TEM you have Format cusip $8.; which does not change the value but does change how it is shown to you. Or, what you see might not be what you got.
If it ain't the program it must be the data.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Excellent suggestions @data_null__. I would suggest using ROUND(DATE) instead of INT(DATE) to fix the Excel date problem as I have seen Excel (or MS-Access) dates being imported with values such as 12345.999998
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
2. Strip formats and run a proc freq on the two variables in each dataset.
If you can, post results here for help diagnosing issue.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Thanks everyone for your suggestions. One key point: The data is not coming from Excel; if you look at the SAS code, I am connecting remotely to WRDS to get Compustat and CRSP data directly, and then bringing it into SAS. Therefore, this is not the issue.
I will try the other suggestions, specifically the PROC CONTENTS and PROC FREQ to see if there are any issues with the data formats, and wil let you know.
Thanks!
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Hi Everyone,
Thanks again for your great suggestions. I ran PROC CONTENTS and uncovered that CUSIP in CRSP_STK_INDX data table was not the same format as CUSIP in CMPST_INT data table. Once I made the format change, the match merge now works!
Thank you again for your great help!
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
MERGE does not use formatted value of a BY variable to do matching (unless the GROUPFORMAT option is used on the BY statement). What did you actually change?
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Hi data_null_,
I formatted the variable CUSIP in the CMPST_INT data table in a Data step, not in the MERGE statement:
Data CMPST_INT;
Format CUSIP $8. Date yymmddn8.;