SAS Programming

mmh · Posted 11-22-2021 07:58 PM

Hello,
I am trying to perform an event study.
I have a time-series data which contains the return of stocks from all the companies from 01-Jan-2021 to 31-Dec-2021.
I have put a small sample of my data and the my expected output below for your kind consideration. The dataset is attached at the end.
Here is my sample data:

Company_ID Date Return Reporting_Date Event_Window Gap Estimation_Window
1001 01JAN2021 0.01
1001 04JAN2021 0.02
1001 05JAN2021 0.03
1001 06JAN2021 0.09
1001 07JAN2021 0.05
1001 08JAN2021 0.06
1001 11JAN2021 0.07
1001 12JAN2021 0.08
1001 13JAN2021 0.09
1001 14JAN2021 0.1 14JAN2021 3 2 5
1001 15JAN2021 0.11
1001 18JAN2021 0.12
1001 19JAN2021 0.13
1001 20JAN2021 0.14
1001 21JAN2021 0.15
1001 22JAN2021 0.16
1001 25JAN2021 0.17
1001 26JAN2021 0.18
1001 27JAN2021 0.19
1001 28JAN2021 0.2
1001 29JAN2021 0.21
1001 01FEB2021 0.22
1001 02FEB2021 0.23
1001 03FEB2021 0.24
1001 04FEB2021 0.25
1001 05FEB2021 0.26
1001 08FEB2021 0.27
1001 09FEB2021 0.26 09FEB2021 3 2 5
1001 10FEB2021 0.29
1001 11FEB2021 0.3
1001 12FEB2021 0.31
1001 15FEB2021 0.32
1001 16FEB2021 0.33
1001 17FEB2021 0.34
1001 18FEB2021 0.35
1001 19FEB2021 0.36
1001 22FEB2021 0.37
1001 23FEB2021 0.38
1001 24FEB2021 0.39
1001 25FEB2021 0.4
1001 26FEB2021 0.41
1002 01JAN2021 0.22
1002 04JAN2021 0.23
1002 05JAN2021 0.24
1002 06JAN2021 0.25
1002 07JAN2021 0.26
1002 08JAN2021 0.27
1002 11JAN2021 0.28
1002 12JAN2021 0.29
1002 13JAN2021 0.3
1002 14JAN2021 0.31
1002 15JAN2021 0.32
1002 18JAN2021 0.39 18JAN2021 3 2 5
1002 19JAN2021 0.34
1002 20JAN2021 0.35
1002 21JAN2021 0.36
1002 22JAN2021 0.37
1002 25JAN2021 0.38
1002 26JAN2021 0.39
1002 27JAN2021 0.4
1002 28JAN2021 0.41
1002 29JAN2021 0.42
1003 01JAN2021 0.43
1003 04JAN2021 0.44
1003 05JAN2021 0.45
1003 06JAN2021 0.46
1003 07JAN2021 0.47 07JAN2021 3 2 5
1003 08JAN2021 0.48
1003 11JAN2021 0.49
1003 12JAN2021 0.5
1003 13JAN2021 0.51
1003 14JAN2021 0.52
1003 15JAN2021 0.53
1003 18JAN2021 0.54
1003 19JAN2021 0.55
1003 20JAN2021 0.56
1003 21JAN2021 0.57
1004 01JAN2021 0.03
1004 04JAN2021 0.06 04JAN2021 3 2 5
1004 05JAN2021 0.09
1004 06JAN2021 0.12
1004 07JAN2021 0.15

As you can see, every company has just 1 (one) reporting date in each month. That is, all the other fields in the column "Reporting_Date" are blank.
I want to perform the event study around the Reporting date of a company.

Essentially, I am taking
1. "Event_Window" of 3 days around the "Reporting_Date"

2. A "Gap" of 2 days both before and after the Event_Window, which I do not want to include in my calculation (total 4 days)
3. And an "Estimation_Window" of 5 days both before and after the Gap days (total 10 days)

For instance, for the company_ID 1001:

Event days are: Event_Date_-1(13JAN2021), Event_Date_0(14JAN2021), Event_Date_+1(15JAN2021)

Gap days are 11JAN2021, 12JAN2021, 18JAN2021, 19JAN2021

Estimation_Windows are 04, 05, 06, 07, 08, 20, 21, 22, 25, 26 JAN2021

For every Event_Date, first I need to Average the Return during the 10 days of the Estimation_Window, and then divide the Return by this Average.

For Instance, for Company_ID 1001 and Reporting_Date 2021-01-14:

Average of the Estimation window is (0.02+0.03+0.9+0.05+0.06+0.14+0.15+0.16+0.17+0.18)/10 = 0.105

The values that I want is:
Event_Date_-1: 0.09/0.105 = 0.857142857142857

Event_Date_0: 0.1/0.105 = 0.952380952380952

Event_Date_+1: 0.11/0.105 = 1.04761904761905

My output for the sample data that I have provided should be:

Company ID	Date	Event_Date_-1	Event_Date_0	Event_Date_+1
1001	2021-01-14	0.857142857	0.952	1.047619048
1001	2021-02-09	0.964285714	0.929	1.035714286
1002	2021-01-18	0.96969697	1.181818182	1.03030303

As you can see, in total, I need at least the data for 17 days around the Reporting_Date to perform my analysis. So, for instance, for the Company_ID 1003 and 1004, there is not enough dates (only 14 days instead of the needed 17 days for Company_ID 1003 and only 05 days instead of the needed 17 days for Company_ID 1004). Therefore, I want to drop this observation from my analysis and create a separate list of dropped observations, containing the Company_ID and Reporting_Date as column heads as follows:

Company_ID	Reporting_Date
1003	2021-01-07
1004	2021-01-04

Please note that the days in the Estimation_Window and Gap cannot be simply calculated by considering them as continuous calendar days. As you will notice, these are trading days and the holidays are not included in the table.

Thank you so much for your time!

andreas_lds · Posted 11-23-2021 07:32 AM

Another solution using a hash object:

data work.Enhanced;
   set work.have;

   obsnum = _n_;
run;

data 
   work.want(keep= Company_ID Reporting_Date Before_Event_Date At_Event_Date After_Event_Date)
   work.NotEnoughData(keep= Company_ID Reporting_Date)
;
   set work.Enhanced;
   where not missing(Reporting_Date);

   length 
      Average Reporting_Return 8
      Before_Event_Date At_Event_Date After_Event_Date 8
   ;


   if _n_ = 1 then do;
      declare hash h(dataset: 'work.enhanced');
      h.defineKey('obsnum', 'company_id');
      h.defineData('Return');
      h.defineDone();
   end;

   reporting_return = return;
   average = 0;

   /* check if data for the estimation windows is available */
   start_before = obsnum - 8;
   end_before = obsnum - 4;
   before_ok = (h.check(key: start_before, key: company_id) = 0) 
         and (h.check(key: end_before, key: company_id) = 0);

   start_after = obsnum + 4;
   end_after = obsnum + 8;
   after_ok = (h.check(key: start_after, key: company_id) = 0) 
         and (h.check(key: end_after, key: company_id) = 0); 

   if before_ok and after_ok then do;
      do i = start_before to end_before;
         rc = h.find(key: i, key: company_id);
         average = average + return;
      end;

      do i = start_after to end_after;
         rc = h.find(key: i, key: company_id);
         average = average + return;
      end;

      average = average / 10;

      At_Event_Date = reporting_return / average;

      rc = h.find(key: obsnum-1, key: company_id);
      Before_Event_Date = Return / average;

      rc = h.find(key: obsnum+1, key: company_id);
      After_Event_Date = Return / average;

      output work.want;
   end;
   else do;
      output work.NotEnoughData;
   end;
run;

View solution in original post

Oligolas · Posted 11-23-2021 05:50 AM

Hi,

this should do the trick:

LIBNAME in "C:\Temp" access=readonly;

DATA h01;
   length month 8;
   SET in.have;
   month=month(date);
RUN;

PROC SORT data=h01 out=h02; BY company_id month date; RUN;
LIBNAME in CLEAR;

*Add row number;
data h03;
   length id 8;
   set h02;
   BY company_id month date;
   if first.month then id=0;
   id+1;
run;

PROC SQL;
   *Merge reporting_date to all rows by company and month;
   CREATE TABLE h04 AS
      SELECT a.id,b.id as bid,a.company_id,month(a.date) as month, a.date,a.return
      ,b.reporting_date,b.event_window,b.gap,b.estimation_window,b.date as date0
      FROM h03 a
      LEFT JOIN h03(where=(not missing(reporting_date))) b
      ON a.company_id eq b.company_id
      AND month(a.date) eq month(b.date)

      ORDER BY a.company_id,calculated month,a.date
   ;
   *Check single assignment of reporting_date by company and month;
   CREATE TABLE check0 AS
      SELECT DISTINCT company_id,month,reporting_date
      FROM h04
      ORDER BY company_id,month,reporting_date
   ;
QUIT;

DATA _NULL_;
   set check0;
   BY company_id month reporting_date;
   if not first.month and last.month then put 'E' 'RROR: several reporting dates by company and month found' company_id= month= reporting_date=;
RUN;

*clean up;
PROC DATASETS lib=work nolist; delete check0; RUN;QUIT;


PROC FORMAT;
   value period
   1='Estimation window'
   2='Gap'
   3.1='Event date -1'
   3.2='Event date'
   3.3='Event date +1'
   ;
RUN;

DATA h05;
   SET h04;
   length period rangelo rangehi 8;

   *Determine period for Estimation window;
   rangelo=bid-1-gap-estimation_window;
   rangehi=bid+1+gap+estimation_window;
   if rangelo<=id<=rangehi then period=1;

   *Determine period for Gap;
   rangelo=bid-1-gap;
   rangehi=bid+1+gap;
   if rangelo<=id<=rangehi then period=2;

   *Determine period for Event;
   rangelo=bid-1;
   rangehi=bid+1;
   if rangelo=id then period=3.1;
   else if id=bid then period=3.2;
   else if rangehi=id then period=3.3;
   
   drop range:;
   format period period.;
RUN;

PROC SQL;
   *Calculate avg;
   CREATE TABLE h06_avg_ew AS
      SELECT company_id,month,period,avg(return) as avg_ew
      FROM h05
      WHERE period eq 1
      GROUP BY company_id,month,period
   ;
   *Merge avg to corresponding event period;
   CREATE TABLE h07 AS
      SELECT a.*,b.avg_ew
      FROM h05 a
      LEFT JOIN h06_avg_ew b
      ON a.company_id eq b.company_id 
      AND a.month eq b.month
      AND floor(a.period) eq 3
      ORDER BY company_id,month,date
   ;
QUIT;

*Calculate results;
DATA h08;
   set h07;
   where floor(period) eq 3;
   by company_id month date;
   length result 8 periodc $50;
   if not missing(avg_ew) then result=return/avg_ew;
   periodc=vvalue(period);
   keep company_id month date0 period: result;
RUN;

*Transpose name and label variables;
PROC TRANSPOSE data=h08 out=h09(drop=_name_) prefix=period ;
   by company_id month date0 ;
   var result;
   id period;
   idlabel periodc;
   format period best.;
RUN;

DATA want;
   set h09;
   rename date0=date;
   keep company_id date0 period:;
RUN;

PROC DATASETS lib=work nolist; delete h0:; RUN;QUIT;

________________________
- Cheers -

mmh · Posted 11-23-2021 11:22 PM

Hello Oligolas,
Thank you so much for your time and effort.
Unfortunately, when I am trying to run the program, I have receiving the following output:

As you will notice, the result for Company_ID 1001 on 09FEB2021 is not as desired.
Also, we are receiving results for Company_ID 1003 and 1004 in our output.
Is there any way to further fix this issue?

I thank you again for your kind support!

mmh · Posted 11-24-2021 01:14 AM

Hello Oligolas,
I have got the code for my problem, thanks to Andreas.
But I would like to thank you again for your time and effort.
Really appreciate your work!

Oligolas · Posted 11-24-2021 04:03 AM

@mmh oh yes, I see in february you also take dates from January in order to determine the average of the estimation window, whereas I consider only the values in each month.

________________________
- Cheers -

mmh · Posted 11-24-2021 04:48 AM

Now I understand.
And this might come in handy for me in the future.
Thanks again!

andreas_lds · Posted 11-23-2021 07:32 AM

Another solution using a hash object:

data work.Enhanced;
   set work.have;

   obsnum = _n_;
run;

data 
   work.want(keep= Company_ID Reporting_Date Before_Event_Date At_Event_Date After_Event_Date)
   work.NotEnoughData(keep= Company_ID Reporting_Date)
;
   set work.Enhanced;
   where not missing(Reporting_Date);

   length 
      Average Reporting_Return 8
      Before_Event_Date At_Event_Date After_Event_Date 8
   ;


   if _n_ = 1 then do;
      declare hash h(dataset: 'work.enhanced');
      h.defineKey('obsnum', 'company_id');
      h.defineData('Return');
      h.defineDone();
   end;

   reporting_return = return;
   average = 0;

   /* check if data for the estimation windows is available */
   start_before = obsnum - 8;
   end_before = obsnum - 4;
   before_ok = (h.check(key: start_before, key: company_id) = 0) 
         and (h.check(key: end_before, key: company_id) = 0);

   start_after = obsnum + 4;
   end_after = obsnum + 8;
   after_ok = (h.check(key: start_after, key: company_id) = 0) 
         and (h.check(key: end_after, key: company_id) = 0); 

   if before_ok and after_ok then do;
      do i = start_before to end_before;
         rc = h.find(key: i, key: company_id);
         average = average + return;
      end;

      do i = start_after to end_after;
         rc = h.find(key: i, key: company_id);
         average = average + return;
      end;

      average = average / 10;

      At_Event_Date = reporting_return / average;

      rc = h.find(key: obsnum-1, key: company_id);
      Before_Event_Date = Return / average;

      rc = h.find(key: obsnum+1, key: company_id);
      After_Event_Date = Return / average;

      output work.want;
   end;
   else do;
      output work.NotEnoughData;
   end;
run;

mmh · Posted 11-24-2021 01:11 AM

Hello Andreas,
I do not know how to thank you!
The solution works perfectly fine 🙂
Do you mind if I send you a quick text if sometimes later I try to modify my event estimations and run in to any problems?
Again, I am really grateful for your kind support!
All the best!

SAS Programming

Event study

Re: Event study

Re: Event study

Re: Event study

Re: Event study

Re: Event study

Re: Event study

Re: Event study

Re: Event study

Optimizing SAS Event Stream Processing Studio Usage

event study

Event study window

Dates for Event Study

Extracting Windowed Time Series Features with Python in SAS Event Stre...

Follow Us

What is...

SAS Programming

Our biggest data and AI event of the year.

SAS Training: Just a Click Away

Follow Us

What is...