Hello,
How can we calculate std on data that has an additional row each time.
It is not a regular moving std , as the data window size increases each time.
data available ;
input dt y ;
cards ;
1 80
2 20
3 40
4 60
;
run ;
data wanted ;
input dt std ;
cards ;
1 80 .
2 20 42.426406871
3 40 30.550504633
4 60 25.819888975
;
run ;
data calc ;
x1 = 80 ;
x2 = 20 ;
x3 = 40 ;
x4 = 60 ;
new1 = std(x1,x2) ; *42.426406871 ;
new2 = std(x1,x2,x3) ; *30.550504633 ;
new3 = std(x1,x2,x3,x4) ;* 25.819888975 ;
run ;
Thanks
Thanks for providing data as a SAS data step. Very helpful. I wish more people would do that.
This seems to work. If you have thousands or millions of observations, this will note scale up efficiently.
data available ;
input dt y ;
cards ;
1 80
2 20
3 40
4 60
;
/* Compute number of observations in data set available */
/* Code from https://communities.sas.com/t5/SAS-Communities-Library/Determining-the-number-of-observations-in-a-SAS-data-set/ta-p/475174 */
data _NULL_;
if 0 then set available nobs=n;
call symputx('nrows',n);
stop;
run;
data intermediate;
set available;
do group=_n_ to &nrows;
thisy=y;
output;
end;
run;
proc summary data=intermediate nway;
class group;
var thisy;
output out=want stddev=;
run;
oops..
data available ;
input dt y ;
cards ;
1 80
2 20
3 40
4 60
;
run ;
data wanted ;
input dt y std ;* <---three input variables ;
cards ;
1 80 .
2 20 42.426406871
3 40 30.550504633
4 60 25.819888975
;
run ;
data calc ;
x1 = 80 ;
x2 = 20 ;
x3 = 40 ;
x4 = 60 ;
new1 = std(x1,x2) ; *42.426406871 ;
new2 = std(x1,x2,x3) ; *30.550504633 ;
new3 = std(x1,x2,x3,x4) ;* 25.819888975 ;
run ;
Thanks for providing data as a SAS data step. Very helpful. I wish more people would do that.
This seems to work. If you have thousands or millions of observations, this will note scale up efficiently.
data available ;
input dt y ;
cards ;
1 80
2 20
3 40
4 60
;
/* Compute number of observations in data set available */
/* Code from https://communities.sas.com/t5/SAS-Communities-Library/Determining-the-number-of-observations-in-a-SAS-data-set/ta-p/475174 */
data _NULL_;
if 0 then set available nobs=n;
call symputx('nrows',n);
stop;
run;
data intermediate;
set available;
do group=_n_ to &nrows;
thisy=y;
output;
end;
run;
proc summary data=intermediate nway;
class group;
var thisy;
output out=want stddev=;
run;
All 3 replies to my question are good solutions
Thanks so much 🙂
Hello,
Next, if we wanted to add another field such as group by and
calculate the std for each value of gr in the data below:-
what would be the best solution ? Thanks a lot.
data available ;
input gr dt y ;
cards ;
1 1 80
1 2 20
1 3 40
1 4 60
2 1 80
2 2 20
2 3 40
2 4 50
;
run ;
data wanted ;
input gr dt y std ;
cards ;
1 1 80
1 2 20 42.426406871
1 3 40 30.550504633
1 4 60 25.819888975
2 1 80
2 2 20 42.426406871
2 3 40 30.550504633
2 4 50 25
;
run ;
data available ;
input dt y ;
cards ;
1 80
2 20
3 40
4 60
;
data want;
set available;
array x{999999} _temporary_;
x{_n_}=y;
std=std(of x{*});
run;
Hello @J111,
You can also compute the cumulative standard deviation using Steiner's theorem:
data want(drop=_:);
set available;
retain _v 0;
_s+y; /* cumulative sum */
_m=_s/_n_; /* cumulative mean */
_d=dif(_m); /* mean change */
_q=(y-_m)**2; /* new term in sum of squares */
if _n_>1 then do;
std=sqrt(_v+_d**2+_q/(_n_-1)); /* cumulative standard deviation */
_v=((_n_-1)*(_v+_d**2)+_q)/_n_; /* cumulative population variance */
end;
run;
This calculation should take less than one second for a million observations.
Catch the best of SAS Innovate 2025 — anytime, anywhere. Stream powerful keynotes, real-world demos, and game-changing insights from the world’s leading data and AI minds.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.