Hi all,
I need SAS to calculate rolling standard deviation and number of observation it has used. Instead of controlling for number of observations I need to use (I know I can use expand in this case), I want to use all observations prior that year. Since my dataset is very large, proc sql may be very slow. I will appreciate it very much if someone can help me with this!
This is the data I have
| firm | year | roa |
| 1 | 1999 | 0.05 |
| 1 | 2000 | 0.048 |
| 1 | 2001 | 0.01 |
| 1 | 2002 | 0.02 |
| 2 | 1999 | 0.024 |
| 2 | 2000 | 0.07 |
| 2 | 2001 | 0.065 |
This is the data I want:
| firm | year | roa | std | count |
| 1 | 1999 | 0.05 | ||
| 1 | 2000 | 0.048 | 0.001414 | 2 |
| 1 | 2001 | 0.01 | 0.022539 | 3 |
| 1 | 2002 | 0.02 | 0.020067 | 4 |
| 2 | 1999 | 0.024 | ||
| 2 | 2000 | 0.07 | 0.032527 | 2 |
| 2 | 2001 | 0.065 | 0.025239 | 3 |
Many thanks!
1) You need 10 temporary array.
2)
data HAVE;
infile cards expandtabs;
input FIRM YEAR ROA bm;
cards;
1 1999 0.05 3.2
1 2000 0.048 .
1 2001 . 4.4
1 2002 0.02 2.5
2 1999 0.024 .
2 2000 . 2
2 2001 0.065 3
;
run;
data want;
set have;
by firm;
array x1{999} _temporary_;
array x2{999} _temporary_;
if first.firm then do;n=0;call missing(of x1{*} x2{*]);end;
n+1;
x1{n}=roa;
x2{n}=bm;
count_roa=n(of x1{*});
std_roa=std(of x1{*});
count_bm=n(of x2{*});
std_bm=std(of x2{*});
drop n;
run;
Like this?
data HAVE;
input FIRM YEAR ROA;
cards;
1 1999 0.05
1 2000 0.048
1 2001 0.01
1 2002 0.02
2 1999 0.024
2 2000 0.07
2 2001 0.065
run;
proc format ;
value prevyr (multilabel) 1999 ='Pre1999'
1999-2000 ='Pre2000'
1999-2001 ='Pre2001'
1999-2002 ='Pre2002'
1999-2003 ='Pre2003'
1999-2004 ='Pre2004';
run;
proc summary nway;
class FIRM ;
class YEAR / mlf;
var ROA;
output out=WANT std=;
format YEAR prevyr.;
run;
| FIRM | YEAR | _TYPE_ | _FREQ_ | ROA |
|---|---|---|---|---|
| 1 | Pre1999 | 3 | 1 | . |
| 1 | Pre2000 | 3 | 2 | 0.001414 |
| 1 | Pre2001 | 3 | 3 | 0.022539 |
| 1 | Pre2002 | 3 | 4 | 0.020067 |
| 1 | Pre2003 | 3 | 4 | 0.020067 |
| 1 | Pre2004 | 3 | 4 | 0.020067 |
| 2 | Pre1999 | 3 | 1 | . |
| 2 | Pre2000 | 3 | 2 | 0.032527 |
| 2 | Pre2001 | 3 | 3 | 0.025239 |
| 2 | Pre2002 | 3 | 3 | 0.025239 |
| 2 | Pre2003 | 3 | 3 | 0.025239 |
| 2 | Pre2004 | 3 | 3 | 0.025239 |
DIY method
data want;
count = 0;
s1 = 0;
s2 = 0;
do until(last.firm);
set have; by firm;
if not missing(roa) then do;
count = count + 1;
s1 = s1 + roa;
s2 = s2 + roa*roa;
if count > 1 then std = sqrt((count*s2-s1*s1)/(count*(count-1)));
end;
output;
end;
drop s1 s2;
run;
It can be done using a datastep, e.g.:
data want;
do until(last.firm);
set have;
by firm year;
_sum+roa;
_sqsum+roa*roa;
count+not missing(roa);
if last.year then do;
if count<2 then
std=.;
else std=sqrt((count*_sqsum-_sum*_sum)/(count*(count-1)));
end;
output;
end;
call missing (count,_sum,_sqsum);
drop _s:;
run;
data HAVE;
input FIRM YEAR ROA;
cards;
1 1999 0.05
1 2000 0.048
1 2001 0.01
1 2002 0.02
2 1999 0.024
2 2000 0.07
2 2001 0.065
;
run;
data want;
set have;
by firm;
array x{999} _temporary_;
if first.firm then do;n=0;call missing(of x{*});end;
n+1;
x{n}=roa;
std=std(of x{*});
run;
proc print;run;
Hi Ksharp,
Thanks so much ! But I still have two more questions:
(1) What if I have more than one variables (say about 10 variables) that needs to calculate std?
(2) What if there are missing values and I only want to calculate the number of observations that have been used to calculate standard deviation?
The data I have:
| firm | year | roa | bm |
| 1 | 1999 | 0.05 | 3.2 |
| 1 | 2000 | 0.048 | |
| 1 | 2001 | 4.4 | |
| 1 | 2002 | 0.02 | 2.5 |
| 2 | 1999 | 0.024 | |
| 2 | 2000 | 2 | |
| 2 | 2001 | 0.065 | 3 |
The data I want:
| firm | year | roa | stdroa | countroa | bm | stdbm | countbm |
| 1 | 1999 | 0.05 | 3.2 | 1 | |||
| 1 | 2000 | 0.048 | 0.0014 | 2 | 1 | ||
| 1 | 2001 | 0.0014 | 2 | 4.4 | 0.84853 | 2 | |
| 1 | 2002 | 0.02 | 0.0168 | 3 | 2.5 | 0.9609 | 3 |
| 2 | 1999 | 0.024 | |||||
| 2 | 2000 | 1 | 2 | 2 | |||
| 2 | 2001 | 0.065 | 0.029 | 2 | 3 | 0.70711 | 3 |
Many Thanks!
Dara
1) You need 10 temporary array.
2)
data HAVE;
infile cards expandtabs;
input FIRM YEAR ROA bm;
cards;
1 1999 0.05 3.2
1 2000 0.048 .
1 2001 . 4.4
1 2002 0.02 2.5
2 1999 0.024 .
2 2000 . 2
2 2001 0.065 3
;
run;
data want;
set have;
by firm;
array x1{999} _temporary_;
array x2{999} _temporary_;
if first.firm then do;n=0;call missing(of x1{*} x2{*]);end;
n+1;
x1{n}=roa;
x2{n}=bm;
count_roa=n(of x1{*});
std_roa=std(of x1{*});
count_bm=n(of x2{*});
std_bm=std(of x2{*});
drop n;
run;
April 27 – 30 | Gaylord Texan | Grapevine, Texas
Walk in ready to learn. Walk out ready to deliver. This is the data and AI conference you can't afford to miss.
Register now and lock in 2025 pricing—just $495!
Still thinking about your presentation idea? The submission deadline has been extended to Friday, Nov. 14, at 11:59 p.m. ET.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.