turn on suggestions

Auto-suggest helps you quickly narrow down your search results by suggesting possible matches as you type.

Showing results for

Find a Community

- Home
- /
- SAS Programming
- /
- Base SAS Programming
- /
- Rolling cumulative sum by BY group.

Topic Options

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

06-01-2011 05:14 AM

Hi everyone, I have a dataset that looks like below.

**permno date ret**

A 01/Jan/1985 0.14

A 01/Feb/1985 0.43

A 01/Mar/1985 -0.59

. ................... .....

. ..................... .......

. .................... .......

A 01/Apr/2011 0.42

B 01/Feb/1985 0.452

B 01/Mar/1985 -0.52

B

B

B

.

.

**Several features of the dataset:**

- Each permno doesn't necessarily have the same observations; So, maybe the first permno starts with January, but the second permo starts with March because of missing values for Jan + Feb

**The final output I want to have:**

- a cumulative sum of using an arbitrary rolling window, say, 3 months or so

- For example, for 3-month window, the cumulative sum works as follows: sum of 1st, 2nd, and 3rd observatoins and output the cumulative sum. Then sum of 2nd, 3rd, and 4th obs and output the cumulative sum

**What I have done and checked and made sure it computes correctly:** I have written a macro to compute such cumulative rolling sum, i.e. sum of each 3 obs, output a dataset, then proc append each of these datasets.

**What is my current problem that I can't figure out how to tackle:**

- It might happen that the LAST 2 obs of 1st permno is used along with the 1st obs of the 2nd permno.

- In other words, I don't know how to recognize that the cumulative sum runs through each of the permnos and stops at the N-(window-1) of each permno, where N is the # of observations for each permno, instead of through to the first obs of the next permno.

**What I currently have in mind:**

- Create each of the dataset for each of the permnos

- Then apply the rolling macro I wrote to each of thems.

But then, I'm faced with a problem: What is the best way I create each of the dataset for each permno? proc sort by permno NODUPKEY. Then I use the following

**data first_permno;**

set original_ds;

where permno="A";

run;

This way, I have to create tens of thousands of datasets corresponding to each permno! Also, I need to create a macro variable containing all unique permnos and using**%scan** function to scan through such a list and extract each of them and put them in **WHERE** statement in the SAS code just above? It seems complicated.

Can you please suggest some way out of this?

Thanks a lot for your help! I really appreciate it. Message was edited by: smilingmelbourne

A 01/Jan/1985 0.14

A 01/Feb/1985 0.43

A 01/Mar/1985 -0.59

. ................... .....

. ..................... .......

. .................... .......

A 01/Apr/2011 0.42

B 01/Feb/1985 0.452

B 01/Mar/1985 -0.52

B

B

B

.

.

- Each permno doesn't necessarily have the same observations; So, maybe the first permno starts with January, but the second permo starts with March because of missing values for Jan + Feb

- a cumulative sum of using an arbitrary rolling window, say, 3 months or so

- For example, for 3-month window, the cumulative sum works as follows: sum of 1st, 2nd, and 3rd observatoins and output the cumulative sum. Then sum of 2nd, 3rd, and 4th obs and output the cumulative sum

- It might happen that the LAST 2 obs of 1st permno is used along with the 1st obs of the 2nd permno.

- In other words, I don't know how to recognize that the cumulative sum runs through each of the permnos and stops at the N-(window-1) of each permno, where N is the # of observations for each permno, instead of through to the first obs of the next permno.

- Create each of the dataset for each of the permnos

- Then apply the rolling macro I wrote to each of thems.

But then, I'm faced with a problem: What is the best way I create each of the dataset for each permno? proc sort by permno NODUPKEY. Then I use the following

set original_ds;

where permno="A";

run;

This way, I have to create tens of thousands of datasets corresponding to each permno! Also, I need to create a macro variable containing all unique permnos and using

Can you please suggest some way out of this?

Thanks a lot for your help! I really appreciate it. Message was edited by: smilingmelbourne

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

06-01-2011 06:45 AM

Is it this what you're after?

data have;

do permno='A','B','C';

format date date9.;

date=INTNX('month','01jan2011'd,ceil(ranuni(2)*2),'b');

do while (date le '01jul2011'd);

ret=1;

output;

date=INTNX('month', date,1,'b');

end;

end;

run;

data want;

set have;

by permno;

lag_ret =lag(ret);

lag2_ret=lag2(ret);

if first.permno then n=0;

n+1;

if n gt 2 then cum_ret=sum(ret,lag_ret,lag2_ret);

run;

proc print data=want;

run;

HTH

Patrick Message was edited by: Patrick

data have;

do permno='A','B','C';

format date date9.;

date=INTNX('month','01jan2011'd,ceil(ranuni(2)*2),'b');

do while (date le '01jul2011'd);

ret=1;

output;

date=INTNX('month', date,1,'b');

end;

end;

run;

data want;

set have;

by permno;

lag_ret =lag(ret);

lag2_ret=lag2(ret);

if first.permno then n=0;

n+1;

if n gt 2 then cum_ret=sum(ret,lag_ret,lag2_ret);

run;

proc print data=want;

run;

HTH

Patrick Message was edited by: Patrick

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

06-01-2011 11:44 PM

You can embed three dummy observations between two pemno.

[pre]

data temp;

input permno $ date : date12. ret;

cards;

A 01/Jan/1985 0.14

A 01/Feb/1985 0.43

A 01/Mar/1985 -0.59

A 01/Apr/2011 0.42

B 01/Feb/1985 0.452

B 01/Mar/1985 -0.52

B 01/apr/1985 -0.52

B 01/May/1985 -0.52

B 01/jun/1985 -0.52

;

run;

data want;

set temp;

_permno=permno; _date=date; _ret=ret;

if permno ne lag(permno) then do;

call missing (of permno date ret );

do i=1 to 4; output; end;/*Make four dummy obs between two permno*/

end;

permno=_permno; date=_date; ret=_ret;

output;

drop _: i;

run;

[/pre]

Ksharp

Message was edited by: Ksharp Message was edited by: Ksharp

[pre]

data temp;

input permno $ date : date12. ret;

cards;

A 01/Jan/1985 0.14

A 01/Feb/1985 0.43

A 01/Mar/1985 -0.59

A 01/Apr/2011 0.42

B 01/Feb/1985 0.452

B 01/Mar/1985 -0.52

B 01/apr/1985 -0.52

B 01/May/1985 -0.52

B 01/jun/1985 -0.52

;

run;

data want;

set temp;

_permno=permno; _date=date; _ret=ret;

if permno ne lag(permno) then do;

call missing (of permno date ret );

do i=1 to 4; output; end;/*Make four dummy obs between two permno*/

end;

permno=_permno; date=_date; ret=_ret;

output;

drop _: i;

run;

[/pre]

Ksharp

Message was edited by: Ksharp Message was edited by: Ksharp

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

06-04-2011 09:29 AM

Thank you, Patrick and Ksharp, for your help. I have revised the macro that I wrote for the rolling sum of return and rolling standard deviation of return.

If you have time, can you suggest how to improve the code?

**%macro rolling_stddev(dsetin=, dsetout=, csvar=, tsvar=, window=, varname=);**

/*dsetin: name of dataset from which to compute rolling standard deviation*/

/*dsetout: name of output dataset*/

/*csvar: name of the cross-sectional variable*/

/*tsvar: name of the time series variable*/

/*window: rolling window, e.g. 3-month, 24-month, etc.*/

/*varname: name of variable for which to compute standard deviation*/

/*Determine number of obs of dsetin*/

data _null_;

set &dsetin;

n = _n_;

call symput('numobs', n);

run;

/*A macro to create a dataset with a single obs that is standard deviation*/

%macro create_rolling_dataset (firstobs=, lastobs=);

proc sort data=&dsetin.;

by &csvar. &tsvar.;

run;

data tempdata;

set &dsetin. (firstobs=&firstobs. obs=&lastobs.);

run;

data tempdata;

set tempdata;

by &csvar.;

retain &varname._sqr 0; /*square ret*/

retain &varname._sqr_sum 0; /*sum of square ret*/

retain &varname._sum 0; /*sum of ret*/

retain &varname._std 0;

if first.&csvar. then do;

&varname._sqr = 0;

&varname._sqr_sum = 0;

&varname._sum = 0;

n = 0;

end;

&varname._sqr = &varname.**2;

&varname._sqr_sum = sum(&varname._sqr_sum, &varname._sqr);

&varname._sum = sum(&varname._sum, &varname.);

n + 1;

if last.&csvar. & n = &window. then do;

&varname._mean = &varname._sum /n;

&varname._std = sqrt( (&varname._sqr_sum - 2*&varname._mean*&varname._sum + n*&varname._mean**2)/(n-1) );

output;

end;

drop &varname._sqr &varname._sqr_sum &varname._sum &varname._mean n;

run;

%mend;

%local i fobs;

%let fobs = 2; /*First obs in the 2nd dataset onwards*/

/*Create a base dataset*/

proc sort data = &dsetin.;

by &csvar. &tsvar.;

run;

data &dsetout.;

set &dsetin. (firstobs=1 obs=&window.);

run;

data &dsetout.;

set &dsetout.;

by &csvar.;

retain &varname._sqr 0; /*square ret*/

retain &varname._sqr_sum 0; /*sum of square ret*/

retain &varname._sum 0; /*sum of ret*/

retain &varname._std 0;

if first.&csvar. then do;

&varname._sqr = 0;

&varname._sqr_sum = 0;

&varname._sum = 0;

n = 0;

end;

&varname._sqr = &varname.**2;

&varname._sqr_sum = sum(&varname._sqr_sum, &varname._sqr);

&varname._sum = sum(&varname._sum, &varname.);

n + 1;

if last.&csvar. & n = &window. then do;

&varname._mean = &varname._sum /n;

&varname._std = sqrt( (&varname._sqr_sum - 2*&varname._mean*&varname._sum + n*&varname._mean**2)/(n-1) );

output;

end;

drop &varname._sqr &varname._sqr_sum &varname._sum &varname._mean n;

run;

/*Execuate a loop from the 2nd dataset to N-(window-1)th dataset*/

%do i = 2 %to (&numobs. - %eval(&window-1)) %by 1; /*Loop over from 2nd dataset to last dataset*/

%local lobs; /*Last obs*/

%let lobs = %eval(&fobs.+%eval(&window-1));

%create_rolling_dataset (firstobs=&fobs, lastobs=&lobs)

proc append base = &dsetout data = tempdata force; run;

proc delete data = tempdata; run;

%let fobs=%eval(&fobs.+1);

%end;

%mend;

If you have time, can you suggest how to improve the code?

/*dsetin: name of dataset from which to compute rolling standard deviation*/

/*dsetout: name of output dataset*/

/*csvar: name of the cross-sectional variable*/

/*tsvar: name of the time series variable*/

/*window: rolling window, e.g. 3-month, 24-month, etc.*/

/*varname: name of variable for which to compute standard deviation*/

/*Determine number of obs of dsetin*/

data _null_;

set &dsetin;

n = _n_;

call symput('numobs', n);

run;

/*A macro to create a dataset with a single obs that is standard deviation*/

%macro create_rolling_dataset (firstobs=, lastobs=);

proc sort data=&dsetin.;

by &csvar. &tsvar.;

run;

data tempdata;

set &dsetin. (firstobs=&firstobs. obs=&lastobs.);

run;

data tempdata;

set tempdata;

by &csvar.;

retain &varname._sqr 0; /*square ret*/

retain &varname._sqr_sum 0; /*sum of square ret*/

retain &varname._sum 0; /*sum of ret*/

retain &varname._std 0;

if first.&csvar. then do;

&varname._sqr = 0;

&varname._sqr_sum = 0;

&varname._sum = 0;

n = 0;

end;

&varname._sqr = &varname.**2;

&varname._sqr_sum = sum(&varname._sqr_sum, &varname._sqr);

&varname._sum = sum(&varname._sum, &varname.);

n + 1;

if last.&csvar. & n = &window. then do;

&varname._mean = &varname._sum /n;

&varname._std = sqrt( (&varname._sqr_sum - 2*&varname._mean*&varname._sum + n*&varname._mean**2)/(n-1) );

output;

end;

drop &varname._sqr &varname._sqr_sum &varname._sum &varname._mean n;

run;

%mend;

%local i fobs;

%let fobs = 2; /*First obs in the 2nd dataset onwards*/

/*Create a base dataset*/

proc sort data = &dsetin.;

by &csvar. &tsvar.;

run;

data &dsetout.;

set &dsetin. (firstobs=1 obs=&window.);

run;

data &dsetout.;

set &dsetout.;

by &csvar.;

retain &varname._sqr 0; /*square ret*/

retain &varname._sqr_sum 0; /*sum of square ret*/

retain &varname._sum 0; /*sum of ret*/

retain &varname._std 0;

if first.&csvar. then do;

&varname._sqr = 0;

&varname._sqr_sum = 0;

&varname._sum = 0;

n = 0;

end;

&varname._sqr = &varname.**2;

&varname._sqr_sum = sum(&varname._sqr_sum, &varname._sqr);

&varname._sum = sum(&varname._sum, &varname.);

n + 1;

if last.&csvar. & n = &window. then do;

&varname._mean = &varname._sum /n;

&varname._std = sqrt( (&varname._sqr_sum - 2*&varname._mean*&varname._sum + n*&varname._mean**2)/(n-1) );

output;

end;

drop &varname._sqr &varname._sqr_sum &varname._sum &varname._mean n;

run;

/*Execuate a loop from the 2nd dataset to N-(window-1)th dataset*/

%do i = 2 %to (&numobs. - %eval(&window-1)) %by 1; /*Loop over from 2nd dataset to last dataset*/

%local lobs; /*Last obs*/

%let lobs = %eval(&fobs.+%eval(&window-1));

%create_rolling_dataset (firstobs=&fobs, lastobs=&lobs)

proc append base = &dsetout data = tempdata force; run;

proc delete data = tempdata; run;

%let fobs=%eval(&fobs.+1);

%end;

%mend;

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

06-05-2011 12:51 AM

Just a couple of thoughts on your macro.

1) it is never wise to embed macro definitions.

2) there are a number of ways to count the number of obs without reading the whole data set. Although not the most classiest, consider:

[pre]

data _null_;

call symputx('numobs', nobs);

stop;

set &dsetin nobs=nobs;

run;[/pre]

3) the incoming data set &dsetin is being sorted multiple times including inside a macro call that is inside a macro loop.

4) did you consider the suggestions of Patrick and Ksharp? taking a big(?) data set breaking it up and putting it back together is always resource intensive.

1) it is never wise to embed macro definitions.

2) there are a number of ways to count the number of obs without reading the whole data set. Although not the most classiest, consider:

[pre]

data _null_;

call symputx('numobs', nobs);

stop;

set &dsetin nobs=nobs;

run;[/pre]

3) the incoming data set &dsetin is being sorted multiple times including inside a macro call that is inside a macro loop.

4) did you consider the suggestions of Patrick and Ksharp? taking a big(?) data set breaking it up and putting it back together is always resource intensive.

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

06-05-2011 10:35 AM

it all seems like too much work!

just embed the 3 (rolling period) element array and the stats derivations in a per-variable macro like

%Macro stats( var, period=3, idx= idx ) ;

Array ar__&var( &period ) ;

Ar__&var( &idx ) = &var ;

Mn_&var = mean( of ar__&var(*) ) ;

St_&var = STD( of ar__&var(*) ) ;

%mend stats ;

Using this macro depends on having the data sorted in PEMNO order. If there might be gaps in the data these need to be filled with some default which might be missing.

Given these conditions in the original data then the process would be this short step

Data want ;

Do row = 1 by 1 until( last.PEMNO ) ;

Set original ;

BY PEMNO;

Idx = 1+ mod( row, 3 );

%stats( ret ) ;

Output ;

end ;

run; Needs a by statement!

Message was edited by: Peter.C

just embed the 3 (rolling period) element array and the stats derivations in a per-variable macro like

%Macro stats( var, period=3, idx= idx ) ;

Array ar__&var( &period ) ;

Ar__&var( &idx ) = &var ;

Mn_&var = mean( of ar__&var(*) ) ;

St_&var = STD( of ar__&var(*) ) ;

%mend stats ;

Using this macro depends on having the data sorted in PEMNO order. If there might be gaps in the data these need to be filled with some default which might be missing.

Given these conditions in the original data then the process would be this short step

Data want ;

Do row = 1 by 1 until( last.PEMNO ) ;

Set original ;

BY PEMNO;

Idx = 1+ mod( row, 3 );

%stats( ret ) ;

Output ;

end ;

run; Needs a by statement!

Message was edited by: Peter.C