Hi all,
I need SAS to calculate rolling standard deviation and number of observation it has used. Instead of controlling for number of observations I need to use (I know I can use expand in this case), I want to use all observations prior that year. Since my dataset is very large, proc sql may be very slow. I will appreciate it very much if someone can help me with this!
This is the data I have
firm | year | roa |
1 | 1999 | 0.05 |
1 | 2000 | 0.048 |
1 | 2001 | 0.01 |
1 | 2002 | 0.02 |
2 | 1999 | 0.024 |
2 | 2000 | 0.07 |
2 | 2001 | 0.065 |
This is the data I want:
firm | year | roa | std | count |
1 | 1999 | 0.05 | ||
1 | 2000 | 0.048 | 0.001414 | 2 |
1 | 2001 | 0.01 | 0.022539 | 3 |
1 | 2002 | 0.02 | 0.020067 | 4 |
2 | 1999 | 0.024 | ||
2 | 2000 | 0.07 | 0.032527 | 2 |
2 | 2001 | 0.065 | 0.025239 | 3 |
Many thanks!
1) You need 10 temporary array. 2) data HAVE; infile cards expandtabs; input FIRM YEAR ROA bm; cards; 1 1999 0.05 3.2 1 2000 0.048 . 1 2001 . 4.4 1 2002 0.02 2.5 2 1999 0.024 . 2 2000 . 2 2 2001 0.065 3 ; run; data want; set have; by firm; array x1{999} _temporary_; array x2{999} _temporary_; if first.firm then do;n=0;call missing(of x1{*} x2{*]);end; n+1; x1{n}=roa; x2{n}=bm; count_roa=n(of x1{*}); std_roa=std(of x1{*}); count_bm=n(of x2{*}); std_bm=std(of x2{*}); drop n; run;
Like this?
data HAVE;
input FIRM YEAR ROA;
cards;
1 1999 0.05
1 2000 0.048
1 2001 0.01
1 2002 0.02
2 1999 0.024
2 2000 0.07
2 2001 0.065
run;
proc format ;
value prevyr (multilabel) 1999 ='Pre1999'
1999-2000 ='Pre2000'
1999-2001 ='Pre2001'
1999-2002 ='Pre2002'
1999-2003 ='Pre2003'
1999-2004 ='Pre2004';
run;
proc summary nway;
class FIRM ;
class YEAR / mlf;
var ROA;
output out=WANT std=;
format YEAR prevyr.;
run;
FIRM | YEAR | _TYPE_ | _FREQ_ | ROA |
---|---|---|---|---|
1 | Pre1999 | 3 | 1 | . |
1 | Pre2000 | 3 | 2 | 0.001414 |
1 | Pre2001 | 3 | 3 | 0.022539 |
1 | Pre2002 | 3 | 4 | 0.020067 |
1 | Pre2003 | 3 | 4 | 0.020067 |
1 | Pre2004 | 3 | 4 | 0.020067 |
2 | Pre1999 | 3 | 1 | . |
2 | Pre2000 | 3 | 2 | 0.032527 |
2 | Pre2001 | 3 | 3 | 0.025239 |
2 | Pre2002 | 3 | 3 | 0.025239 |
2 | Pre2003 | 3 | 3 | 0.025239 |
2 | Pre2004 | 3 | 3 | 0.025239 |
DIY method
data want;
count = 0;
s1 = 0;
s2 = 0;
do until(last.firm);
set have; by firm;
if not missing(roa) then do;
count = count + 1;
s1 = s1 + roa;
s2 = s2 + roa*roa;
if count > 1 then std = sqrt((count*s2-s1*s1)/(count*(count-1)));
end;
output;
end;
drop s1 s2;
run;
It can be done using a datastep, e.g.:
data want;
do until(last.firm);
set have;
by firm year;
_sum+roa;
_sqsum+roa*roa;
count+not missing(roa);
if last.year then do;
if count<2 then
std=.;
else std=sqrt((count*_sqsum-_sum*_sum)/(count*(count-1)));
end;
output;
end;
call missing (count,_sum,_sqsum);
drop _s:;
run;
data HAVE;
input FIRM YEAR ROA;
cards;
1 1999 0.05
1 2000 0.048
1 2001 0.01
1 2002 0.02
2 1999 0.024
2 2000 0.07
2 2001 0.065
;
run;
data want;
set have;
by firm;
array x{999} _temporary_;
if first.firm then do;n=0;call missing(of x{*});end;
n+1;
x{n}=roa;
std=std(of x{*});
run;
proc print;run;
Hi Ksharp,
Thanks so much ! But I still have two more questions:
(1) What if I have more than one variables (say about 10 variables) that needs to calculate std?
(2) What if there are missing values and I only want to calculate the number of observations that have been used to calculate standard deviation?
The data I have:
firm | year | roa | bm |
1 | 1999 | 0.05 | 3.2 |
1 | 2000 | 0.048 | |
1 | 2001 | 4.4 | |
1 | 2002 | 0.02 | 2.5 |
2 | 1999 | 0.024 | |
2 | 2000 | 2 | |
2 | 2001 | 0.065 | 3 |
The data I want:
firm | year | roa | stdroa | countroa | bm | stdbm | countbm |
1 | 1999 | 0.05 | 3.2 | 1 | |||
1 | 2000 | 0.048 | 0.0014 | 2 | 1 | ||
1 | 2001 | 0.0014 | 2 | 4.4 | 0.84853 | 2 | |
1 | 2002 | 0.02 | 0.0168 | 3 | 2.5 | 0.9609 | 3 |
2 | 1999 | 0.024 | |||||
2 | 2000 | 1 | 2 | 2 | |||
2 | 2001 | 0.065 | 0.029 | 2 | 3 | 0.70711 | 3 |
Many Thanks!
Dara
1) You need 10 temporary array. 2) data HAVE; infile cards expandtabs; input FIRM YEAR ROA bm; cards; 1 1999 0.05 3.2 1 2000 0.048 . 1 2001 . 4.4 1 2002 0.02 2.5 2 1999 0.024 . 2 2000 . 2 2 2001 0.065 3 ; run; data want; set have; by firm; array x1{999} _temporary_; array x2{999} _temporary_; if first.firm then do;n=0;call missing(of x1{*} x2{*]);end; n+1; x1{n}=roa; x2{n}=bm; count_roa=n(of x1{*}); std_roa=std(of x1{*}); count_bm=n(of x2{*}); std_bm=std(of x2{*}); drop n; run;
Available on demand!
Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.