@anildonmez wrote:
I have an unbalanced panel data set of daily security log returns. This data set is constructed by merging two data sets: security return data (firm id is gvkey) and another data set containing firm-specific information (firm id is gvkey+iid). Each observation has two dates (date and date2) together with a dummy variable and I'd like to calculate the cumulative log returns between these two dates for those having dummy=1. I'd also like to note that the data is not uniquely defined in the gvkey-date pair since the data set containing firm characteristics has a more disaggregated firm id definition (gvkey+iid). Below, I included a screenshot of the part of my data set:
First things first:
In Compustat GVKEY is the unique firm id, and IID is the issue id within that firm. So gvkey+iid is essentially the unique identifier for a stock issue - it's not really a firm id.
As far as I can tell from the boxes you drew, for each record with dummy=1 you want to aggregate the log returns for all dates between DATE and DATE2 in which dummy is not a zero:
%let begdate=01jan2010;
%let enddate=31dec2010;
data want (drop=_:);
array ret_history {%sysevalf("&begdate"d):%sysevalf("&enddate"d)} _temporary_;
call missing(of ret_history{*});
do until (last.gvkey);
set have;
by gvkey;
if dummy ^=0 then ret_history{date}=log_ret;
end;
do until(last.gvkey);
set have;
by gvkey;
cum_ret=.;
if dummy=1 then do _d=date to date2;
cum_ret=sum(cum_ret,ret_history{_d});
end;
output;
end;
run;
If you know the date range of your data, then set the macrovars BEGDATE and ENDDATE accordingly. This allows the definition of an array RET_HISTORY, indexed by date. For each GVKEY, RET_HISTORY is initialized to missing, and then each of the dates with dummy^=0 has its return put in ret_history. That's the first pass of each gvkey, just to establish ret_history.
Then reread the gvkey. Each time a dummy=1 aggregate the log_ret values from ret_history from date to date2, followed by an output.
Note this assumes that any date with dummy=. or dummy=0 has only a single unique value for log_ret.