BookmarkSubscribeRSS Feed
☑ This topic is solved. Need further help from the community? Please sign in and ask a new question.
J111
Quartz | Level 8

Hello,

How can we calculate std on data that has an additional row each time.

It is not a regular moving std , as the data window size increases each time.


data available ;
input dt y ;
cards ;
1 80
2 20
3 40
4 60
;
run ;

data wanted ;
input dt std ;
cards ;
1 80 .
2 20 42.426406871
3 40 30.550504633
4 60 25.819888975
;
run ;

data calc ;
x1 = 80 ;
x2 = 20 ;
x3 = 40 ;
x4 = 60 ;
new1 = std(x1,x2) ; *42.426406871 ;
new2 = std(x1,x2,x3) ; *30.550504633 ;
new3 = std(x1,x2,x3,x4) ;* 25.819888975 ;
run ;

Thanks

1 ACCEPTED SOLUTION

Accepted Solutions
PaigeMiller
Diamond | Level 26

Thanks for providing data as a SAS data step. Very helpful. I wish more people would do that.

 

This seems to work. If you have thousands or millions of observations, this will note scale up efficiently.

 

data available ;
input dt y ;
cards ;
1 80
2 20
3 40
4 60
;

/* Compute number of observations in data set available */
/* Code from https://communities.sas.com/t5/SAS-Communities-Library/Determining-the-number-of-observations-in-a-SAS-data-set/ta-p/475174 */
data _NULL_;
	if 0 then set available nobs=n;
	call symputx('nrows',n);
	stop;
run;


data intermediate;
    set available;
    do group=_n_ to &nrows;
        thisy=y;
        output;
    end;
run;
        
proc summary data=intermediate nway;
    class group;
    var thisy;
    output out=want stddev=;
run;
--
Paige Miller

View solution in original post

6 REPLIES 6
J111
Quartz | Level 8

oops..

 

data available ;
input dt y ;
cards ;
1 80
2 20
3 40
4 60
;
run ;

data wanted ;
input dt y std ;* <---three input variables ;
cards ;
1 80 .
2 20 42.426406871
3 40 30.550504633
4 60 25.819888975
;
run ;

data calc ;
x1 = 80 ;
x2 = 20 ;
x3 = 40 ;
x4 = 60 ;
new1 = std(x1,x2) ; *42.426406871 ;
new2 = std(x1,x2,x3) ; *30.550504633 ;
new3 = std(x1,x2,x3,x4) ;* 25.819888975 ;
run ;

PaigeMiller
Diamond | Level 26

Thanks for providing data as a SAS data step. Very helpful. I wish more people would do that.

 

This seems to work. If you have thousands or millions of observations, this will note scale up efficiently.

 

data available ;
input dt y ;
cards ;
1 80
2 20
3 40
4 60
;

/* Compute number of observations in data set available */
/* Code from https://communities.sas.com/t5/SAS-Communities-Library/Determining-the-number-of-observations-in-a-SAS-data-set/ta-p/475174 */
data _NULL_;
	if 0 then set available nobs=n;
	call symputx('nrows',n);
	stop;
run;


data intermediate;
    set available;
    do group=_n_ to &nrows;
        thisy=y;
        output;
    end;
run;
        
proc summary data=intermediate nway;
    class group;
    var thisy;
    output out=want stddev=;
run;
--
Paige Miller
J111
Quartz | Level 8

All 3 replies to my question are good solutions

Thanks so much 🙂 

J111
Quartz | Level 8

Hello,

 

Next, if we wanted to add another field such as group by and

calculate the std for each value of gr in the data below:-

what would be the best solution ? Thanks  a lot.


data available ;
input gr dt y ;
cards ;
1 1 80
1 2 20
1 3 40
1 4 60
2 1 80
2 2 20
2 3 40
2 4 50
;
run ;

data wanted ;
input gr dt y std ;
cards ;
1 1 80
1 2 20 42.426406871
1 3 40 30.550504633
1 4 60 25.819888975
2 1 80
2 2 20 42.426406871
2 3 40 30.550504633
2 4 50 25
;
run ;

Ksharp
Super User
data available ;
input dt y ;
cards ;
1 80
2 20
3 40
4 60
;
data want;
 set available;
 array x{999999} _temporary_;
 x{_n_}=y;
 std=std(of x{*});
run;
FreelanceReinh
Jade | Level 19

Hello @J111,

 

You can also compute the cumulative standard deviation using Steiner's theorem:

data want(drop=_:);
set available;
retain _v 0;
_s+y; /* cumulative sum */
_m=_s/_n_; /* cumulative mean */
_d=dif(_m); /* mean change */
_q=(y-_m)**2; /* new term in sum of squares */
if _n_>1 then do;
  std=sqrt(_v+_d**2+_q/(_n_-1)); /* cumulative standard deviation */
  _v=((_n_-1)*(_v+_d**2)+_q)/_n_; /* cumulative population variance */
end;
run;

This calculation should take less than one second for a million observations.

hackathon24-white-horiz.png

The 2025 SAS Hackathon has begun!

It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.

Latest Updates

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 6 replies
  • 1859 views
  • 3 likes
  • 4 in conversation