Solved: how to calculate std on an increasing data, similar to commulative sum

J111 · Posted 09-22-2024 08:37 AM

Hello,

How can we calculate std on data that has an additional row each time.

It is not a regular moving std , as the data window size increases each time.

data available ;
input dt y ;
cards ;
1 80
2 20
3 40
4 60
;
run ;

data wanted ;
input dt std ;
cards ;
1 80 .
2 20 42.426406871
3 40 30.550504633
4 60 25.819888975
;
run ;

data calc ;
x1 = 80 ;
x2 = 20 ;
x3 = 40 ;
x4 = 60 ;
new1 = std(x1,x2) ; *42.426406871 ;
new2 = std(x1,x2,x3) ; *30.550504633 ;
new3 = std(x1,x2,x3,x4) ;* 25.819888975 ;
run ;

Thanks

PaigeMiller · Posted 09-22-2024 10:46 AM

Thanks for providing data as a SAS data step. Very helpful. I wish more people would do that.

This seems to work. If you have thousands or millions of observations, this will note scale up efficiently.

data available ;
input dt y ;
cards ;
1 80
2 20
3 40
4 60
;

/* Compute number of observations in data set available */
/* Code from https://communities.sas.com/t5/SAS-Communities-Library/Determining-the-number-of-observations-in-a-SAS-data-set/ta-p/475174 */
data _NULL_;
	if 0 then set available nobs=n;
	call symputx('nrows',n);
	stop;
run;


data intermediate;
    set available;
    do group=_n_ to &nrows;
        thisy=y;
        output;
    end;
run;
        
proc summary data=intermediate nway;
    class group;
    var thisy;
    output out=want stddev=;
run;

--
Paige Miller

View solution in original post

J111 · Posted 09-22-2024 08:47 AM

oops..

data available ;
input dt y ;
cards ;
1 80
2 20
3 40
4 60
;
run ;

data wanted ;
input dt y std ;* <---three input variables ;
cards ;
1 80 .
2 20 42.426406871
3 40 30.550504633
4 60 25.819888975
;
run ;

data calc ;
x1 = 80 ;
x2 = 20 ;
x3 = 40 ;
x4 = 60 ;
new1 = std(x1,x2) ; *42.426406871 ;
new2 = std(x1,x2,x3) ; *30.550504633 ;
new3 = std(x1,x2,x3,x4) ;* 25.819888975 ;
run ;

PaigeMiller · Posted 09-22-2024 10:46 AM

Thanks for providing data as a SAS data step. Very helpful. I wish more people would do that.

This seems to work. If you have thousands or millions of observations, this will note scale up efficiently.

data available ;
input dt y ;
cards ;
1 80
2 20
3 40
4 60
;

/* Compute number of observations in data set available */
/* Code from https://communities.sas.com/t5/SAS-Communities-Library/Determining-the-number-of-observations-in-a-SAS-data-set/ta-p/475174 */
data _NULL_;
	if 0 then set available nobs=n;
	call symputx('nrows',n);
	stop;
run;


data intermediate;
    set available;
    do group=_n_ to &nrows;
        thisy=y;
        output;
    end;
run;
        
proc summary data=intermediate nway;
    class group;
    var thisy;
    output out=want stddev=;
run;

--
Paige Miller

J111 · Posted 09-23-2024 12:37 AM

All 3 replies to my question are good solutions

Thanks so much 🙂

J111 · Posted 09-23-2024 01:52 AM

Hello,

Next, if we wanted to add another field such as group by and

calculate the std for each value of gr in the data below:-

what would be the best solution ? Thanks a lot.

data available ;
input gr dt y ;
cards ;
1 1 80
1 2 20
1 3 40
1 4 60
2 1 80
2 2 20
2 3 40
2 4 50
;
run ;

data wanted ;
input gr dt y std ;
cards ;
1 1 80
1 2 20 42.426406871
1 3 40 30.550504633
1 4 60 25.819888975
2 1 80
2 2 20 42.426406871
2 3 40 30.550504633
2 4 50 25
;
run ;

Ksharp · Posted 09-22-2024 09:20 PM

data available ;
input dt y ;
cards ;
1 80
2 20
3 40
4 60
;
data want;
 set available;
 array x{999999} _temporary_;
 x{_n_}=y;
 std=std(of x{*});
run;

FreelanceReinh · Posted 09-22-2024 01:05 PM

Hello @J111,

You can also compute the cumulative standard deviation using Steiner's theorem:

data want(drop=_:);
set available;
retain _v 0;
_s+y; /* cumulative sum */
_m=_s/_n_; /* cumulative mean */
_d=dif(_m); /* mean change */
_q=(y-_m)**2; /* new term in sum of squares */
if _n_>1 then do;
  std=sqrt(_v+_d**2+_q/(_n_-1)); /* cumulative standard deviation */
  _v=((_n_-1)*(_v+_d**2)+_q)/_n_; /* cumulative population variance */
end;
run;

This calculation should take less than one second for a million observations.

how to calculate std on an increasing data, similar to commulative sum

Re: how to calculate std on an increasing data, similar to commulative sum

Re: how to calculate std on an increasing data, similar to commulative sum

Re: how to calculate std on an increasing data, similar to commulative sum

Re: how to calculate std on an increasing data, similar to commulative sum

Re: how to calculate std on an increasing data, similar to commulative sum

Re: how to calculate std on an increasing data, similar to commulative sum

Re: how to calculate std on an increasing data, similar to commulative sum

Registration is open

SAS Training: Just a Click Away