Hello,
How can we calculate std on data that has an additional row each time.
It is not a regular moving std , as the data window size increases each time.
data available ;
input dt y ;
cards ;
1 80
2 20
3 40
4 60
;
run ;
data wanted ;
input dt std ;
cards ;
1 80 .
2 20 42.426406871
3 40 30.550504633
4 60 25.819888975
;
run ;
data calc ;
x1 = 80 ;
x2 = 20 ;
x3 = 40 ;
x4 = 60 ;
new1 = std(x1,x2) ; *42.426406871 ;
new2 = std(x1,x2,x3) ; *30.550504633 ;
new3 = std(x1,x2,x3,x4) ;* 25.819888975 ;
run ;
Thanks
Thanks for providing data as a SAS data step. Very helpful. I wish more people would do that.
This seems to work. If you have thousands or millions of observations, this will note scale up efficiently.
data available ;
input dt y ;
cards ;
1 80
2 20
3 40
4 60
;
/* Compute number of observations in data set available */
/* Code from https://communities.sas.com/t5/SAS-Communities-Library/Determining-the-number-of-observations-in-a-SAS-data-set/ta-p/475174 */
data _NULL_;
if 0 then set available nobs=n;
call symputx('nrows',n);
stop;
run;
data intermediate;
set available;
do group=_n_ to &nrows;
thisy=y;
output;
end;
run;
proc summary data=intermediate nway;
class group;
var thisy;
output out=want stddev=;
run;
oops..
data available ;
input dt y ;
cards ;
1 80
2 20
3 40
4 60
;
run ;
data wanted ;
input dt y std ;* <---three input variables ;
cards ;
1 80 .
2 20 42.426406871
3 40 30.550504633
4 60 25.819888975
;
run ;
data calc ;
x1 = 80 ;
x2 = 20 ;
x3 = 40 ;
x4 = 60 ;
new1 = std(x1,x2) ; *42.426406871 ;
new2 = std(x1,x2,x3) ; *30.550504633 ;
new3 = std(x1,x2,x3,x4) ;* 25.819888975 ;
run ;
Thanks for providing data as a SAS data step. Very helpful. I wish more people would do that.
This seems to work. If you have thousands or millions of observations, this will note scale up efficiently.
data available ;
input dt y ;
cards ;
1 80
2 20
3 40
4 60
;
/* Compute number of observations in data set available */
/* Code from https://communities.sas.com/t5/SAS-Communities-Library/Determining-the-number-of-observations-in-a-SAS-data-set/ta-p/475174 */
data _NULL_;
if 0 then set available nobs=n;
call symputx('nrows',n);
stop;
run;
data intermediate;
set available;
do group=_n_ to &nrows;
thisy=y;
output;
end;
run;
proc summary data=intermediate nway;
class group;
var thisy;
output out=want stddev=;
run;
All 3 replies to my question are good solutions
Thanks so much 🙂
Hello,
Next, if we wanted to add another field such as group by and
calculate the std for each value of gr in the data below:-
what would be the best solution ? Thanks a lot.
data available ;
input gr dt y ;
cards ;
1 1 80
1 2 20
1 3 40
1 4 60
2 1 80
2 2 20
2 3 40
2 4 50
;
run ;
data wanted ;
input gr dt y std ;
cards ;
1 1 80
1 2 20 42.426406871
1 3 40 30.550504633
1 4 60 25.819888975
2 1 80
2 2 20 42.426406871
2 3 40 30.550504633
2 4 50 25
;
run ;
data available ;
input dt y ;
cards ;
1 80
2 20
3 40
4 60
;
data want;
set available;
array x{999999} _temporary_;
x{_n_}=y;
std=std(of x{*});
run;
Hello @J111,
You can also compute the cumulative standard deviation using Steiner's theorem:
data want(drop=_:);
set available;
retain _v 0;
_s+y; /* cumulative sum */
_m=_s/_n_; /* cumulative mean */
_d=dif(_m); /* mean change */
_q=(y-_m)**2; /* new term in sum of squares */
if _n_>1 then do;
std=sqrt(_v+_d**2+_q/(_n_-1)); /* cumulative standard deviation */
_v=((_n_-1)*(_v+_d**2)+_q)/_n_; /* cumulative population variance */
end;
run;
This calculation should take less than one second for a million observations.
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.