DATA Step, Macro, Functions and more

Rolling cumulative sum by BY group.

Reply
Contributor
Posts: 72

Rolling cumulative sum by BY group.

Hi everyone, I have a dataset that looks like below.

permno date ret
A 01/Jan/1985 0.14
A 01/Feb/1985 0.43
A 01/Mar/1985 -0.59
. ................... .....
. ..................... .......
. .................... .......
A 01/Apr/2011 0.42

B 01/Feb/1985 0.452
B 01/Mar/1985 -0.52
B
B
B
.
.


Several features of the dataset:
- Each permno doesn't necessarily have the same observations; So, maybe the first permno starts with January, but the second permo starts with March because of missing values for Jan + Feb

The final output I want to have:
- a cumulative sum of using an arbitrary rolling window, say, 3 months or so
- For example, for 3-month window, the cumulative sum works as follows: sum of 1st, 2nd, and 3rd observatoins and output the cumulative sum. Then sum of 2nd, 3rd, and 4th obs and output the cumulative sum

What I have done and checked and made sure it computes correctly: I have written a macro to compute such cumulative rolling sum, i.e. sum of each 3 obs, output a dataset, then proc append each of these datasets.

What is my current problem that I can't figure out how to tackle:
- It might happen that the LAST 2 obs of 1st permno is used along with the 1st obs of the 2nd permno.
- In other words, I don't know how to recognize that the cumulative sum runs through each of the permnos and stops at the N-(window-1) of each permno, where N is the # of observations for each permno, instead of through to the first obs of the next permno.

What I currently have in mind:
- Create each of the dataset for each of the permnos
- Then apply the rolling macro I wrote to each of thems.

But then, I'm faced with a problem: What is the best way I create each of the dataset for each permno? proc sort by permno NODUPKEY. Then I use the following

data first_permno;
set original_ds;
where permno="A";
run;


This way, I have to create tens of thousands of datasets corresponding to each permno! Also, I need to create a macro variable containing all unique permnos and using %scan function to scan through such a list and extract each of them and put them in WHERE statement in the SAS code just above? It seems complicated.

Can you please suggest some way out of this?

Thanks a lot for your help! I really appreciate it. Message was edited by: smilingmelbourne
Respected Advisor
Posts: 3,887

Re: Rolling cumulative sum by BY group.

Is it this what you're after?

data have;
do permno='A','B','C';
format date date9.;
date=INTNX('month','01jan2011'd,ceil(ranuni(2)*2),'b');
do while (date le '01jul2011'd);
ret=1;
output;
date=INTNX('month', date,1,'b');
end;
end;
run;

data want;
set have;
by permno;
lag_ret =lag(ret);
lag2_ret=lag2(ret);
if first.permno then n=0;
n+1;
if n gt 2 then cum_ret=sum(ret,lag_ret,lag2_ret);
run;

proc print data=want;
run;


HTH
Patrick Message was edited by: Patrick
Super User
Posts: 9,671

Re: Rolling cumulative sum by BY group.

You can embed three dummy observations between two pemno.



[pre]
data temp;
input permno $ date : date12. ret;
cards;
A 01/Jan/1985 0.14
A 01/Feb/1985 0.43
A 01/Mar/1985 -0.59
A 01/Apr/2011 0.42
B 01/Feb/1985 0.452
B 01/Mar/1985 -0.52
B 01/apr/1985 -0.52
B 01/May/1985 -0.52
B 01/jun/1985 -0.52
;
run;
data want;
set temp;
_permno=permno; _date=date; _ret=ret;
if permno ne lag(permno) then do;
call missing (of permno date ret );
do i=1 to 4; output; end;/*Make four dummy obs between two permno*/
end;
permno=_permno; date=_date; ret=_ret;
output;
drop _: i;
run;
[/pre]


Ksharp

Message was edited by: Ksharp Message was edited by: Ksharp
Contributor
Posts: 72

Re: Rolling cumulative sum by BY group.

Thank you, Patrick and Ksharp, for your help. I have revised the macro that I wrote for the rolling sum of return and rolling standard deviation of return.

If you have time, can you suggest how to improve the code?

%macro rolling_stddev(dsetin=, dsetout=, csvar=, tsvar=, window=, varname=);


/*dsetin: name of dataset from which to compute rolling standard deviation*/
/*dsetout: name of output dataset*/
/*csvar: name of the cross-sectional variable*/
/*tsvar: name of the time series variable*/
/*window: rolling window, e.g. 3-month, 24-month, etc.*/
/*varname: name of variable for which to compute standard deviation*/


/*Determine number of obs of dsetin*/
data _null_;
set &dsetin;
n = _n_;
call symput('numobs', n);
run;


/*A macro to create a dataset with a single obs that is standard deviation*/
%macro create_rolling_dataset (firstobs=, lastobs=);
proc sort data=&dsetin.;
by &csvar. &tsvar.;
run;

data tempdata;
set &dsetin. (firstobs=&firstobs. obs=&lastobs.);
run;

data tempdata;
set tempdata;
by &csvar.;
retain &varname._sqr 0; /*square ret*/
retain &varname._sqr_sum 0; /*sum of square ret*/
retain &varname._sum 0; /*sum of ret*/
retain &varname._std 0;

if first.&csvar. then do;
&varname._sqr = 0;
&varname._sqr_sum = 0;
&varname._sum = 0;
n = 0;
end;

&varname._sqr = &varname.**2;
&varname._sqr_sum = sum(&varname._sqr_sum, &varname._sqr);
&varname._sum = sum(&varname._sum, &varname.);
n + 1;

if last.&csvar. & n = &window. then do;
&varname._mean = &varname._sum /n;
&varname._std = sqrt( (&varname._sqr_sum - 2*&varname._mean*&varname._sum + n*&varname._mean**2)/(n-1) );
output;
end;

drop &varname._sqr &varname._sqr_sum &varname._sum &varname._mean n;
run;

%mend;

%local i fobs;
%let fobs = 2; /*First obs in the 2nd dataset onwards*/


/*Create a base dataset*/
proc sort data = &dsetin.;
by &csvar. &tsvar.;
run;

data &dsetout.;
set &dsetin. (firstobs=1 obs=&window.);
run;

data &dsetout.;
set &dsetout.;
by &csvar.;
retain &varname._sqr 0; /*square ret*/
retain &varname._sqr_sum 0; /*sum of square ret*/
retain &varname._sum 0; /*sum of ret*/
retain &varname._std 0;

if first.&csvar. then do;
&varname._sqr = 0;
&varname._sqr_sum = 0;
&varname._sum = 0;
n = 0;
end;

&varname._sqr = &varname.**2;
&varname._sqr_sum = sum(&varname._sqr_sum, &varname._sqr);
&varname._sum = sum(&varname._sum, &varname.);
n + 1;

if last.&csvar. & n = &window. then do;
&varname._mean = &varname._sum /n;
&varname._std = sqrt( (&varname._sqr_sum - 2*&varname._mean*&varname._sum + n*&varname._mean**2)/(n-1) );
output;
end;

drop &varname._sqr &varname._sqr_sum &varname._sum &varname._mean n;
run;



/*Execuate a loop from the 2nd dataset to N-(window-1)th dataset*/
%do i = 2 %to (&numobs. - %eval(&window-1)) %by 1; /*Loop over from 2nd dataset to last dataset*/
%local lobs; /*Last obs*/
%let lobs = %eval(&fobs.+%eval(&window-1));
%create_rolling_dataset (firstobs=&fobs, lastobs=&lobs)
proc append base = &dsetout data = tempdata force; run;
proc delete data = tempdata; run;
%let fobs=%eval(&fobs.+1);
%end;
%mend;
Valued Guide
Posts: 632

Re: Rolling cumulative sum by BY group.

Just a couple of thoughts on your macro.
1) it is never wise to embed macro definitions.
2) there are a number of ways to count the number of obs without reading the whole data set. Although not the most classiest, consider:
[pre]
data _null_;
call symputx('numobs', nobs);
stop;
set &dsetin nobs=nobs;
run;[/pre]
3) the incoming data set &dsetin is being sorted multiple times including inside a macro call that is inside a macro loop.
4) did you consider the suggestions of Patrick and Ksharp? taking a big(?) data set breaking it up and putting it back together is always resource intensive.
Valued Guide
Posts: 2,174

Re: Rolling cumulative sum by BY group.

it all seems like too much work!
just embed the 3 (rolling period) element array and the stats derivations in a per-variable macro like
%Macro stats( var, period=3, idx= idx ) ;
Array ar__&var( &period ) ;
Ar__&var( &idx ) = &var ;
Mn_&var = mean( of ar__&var(*) ) ;
St_&var = STD( of ar__&var(*) ) ;
%mend stats ;

Using this macro depends on having the data sorted in PEMNO order. If there might be gaps in the data these need to be filled with some default which might be missing.
Given these conditions in the original data then the process would be this short step
Data want ;
Do row = 1 by 1 until( last.PEMNO ) ;
Set original ;
BY PEMNO;
Idx = 1+ mod( row, 3 );
%stats( ret ) ;
Output ;
end ;
run; Needs a by statement!

Message was edited by: Peter.C
Ask a Question
Discussion stats
  • 5 replies
  • 8461 views
  • 0 likes
  • 5 in conversation