BookmarkSubscribeRSS Feed
☑ This topic is solved. Need further help from the community? Please sign in and ask a new question.
J111
Quartz | Level 8

Hello,

How can we calculate std on data that has an additional row each time.

It is not a regular moving std , as the data window size increases each time.


data available ;
input dt y ;
cards ;
1 80
2 20
3 40
4 60
;
run ;

data wanted ;
input dt std ;
cards ;
1 80 .
2 20 42.426406871
3 40 30.550504633
4 60 25.819888975
;
run ;

data calc ;
x1 = 80 ;
x2 = 20 ;
x3 = 40 ;
x4 = 60 ;
new1 = std(x1,x2) ; *42.426406871 ;
new2 = std(x1,x2,x3) ; *30.550504633 ;
new3 = std(x1,x2,x3,x4) ;* 25.819888975 ;
run ;

Thanks

1 ACCEPTED SOLUTION

Accepted Solutions
PaigeMiller
Diamond | Level 26

Thanks for providing data as a SAS data step. Very helpful. I wish more people would do that.

 

This seems to work. If you have thousands or millions of observations, this will note scale up efficiently.

 

data available ;
input dt y ;
cards ;
1 80
2 20
3 40
4 60
;

/* Compute number of observations in data set available */
/* Code from https://communities.sas.com/t5/SAS-Communities-Library/Determining-the-number-of-observations-in-a-SAS-data-set/ta-p/475174 */
data _NULL_;
	if 0 then set available nobs=n;
	call symputx('nrows',n);
	stop;
run;


data intermediate;
    set available;
    do group=_n_ to &nrows;
        thisy=y;
        output;
    end;
run;
        
proc summary data=intermediate nway;
    class group;
    var thisy;
    output out=want stddev=;
run;
--
Paige Miller

View solution in original post

6 REPLIES 6
J111
Quartz | Level 8

oops..

 

data available ;
input dt y ;
cards ;
1 80
2 20
3 40
4 60
;
run ;

data wanted ;
input dt y std ;* <---three input variables ;
cards ;
1 80 .
2 20 42.426406871
3 40 30.550504633
4 60 25.819888975
;
run ;

data calc ;
x1 = 80 ;
x2 = 20 ;
x3 = 40 ;
x4 = 60 ;
new1 = std(x1,x2) ; *42.426406871 ;
new2 = std(x1,x2,x3) ; *30.550504633 ;
new3 = std(x1,x2,x3,x4) ;* 25.819888975 ;
run ;

PaigeMiller
Diamond | Level 26

Thanks for providing data as a SAS data step. Very helpful. I wish more people would do that.

 

This seems to work. If you have thousands or millions of observations, this will note scale up efficiently.

 

data available ;
input dt y ;
cards ;
1 80
2 20
3 40
4 60
;

/* Compute number of observations in data set available */
/* Code from https://communities.sas.com/t5/SAS-Communities-Library/Determining-the-number-of-observations-in-a-SAS-data-set/ta-p/475174 */
data _NULL_;
	if 0 then set available nobs=n;
	call symputx('nrows',n);
	stop;
run;


data intermediate;
    set available;
    do group=_n_ to &nrows;
        thisy=y;
        output;
    end;
run;
        
proc summary data=intermediate nway;
    class group;
    var thisy;
    output out=want stddev=;
run;
--
Paige Miller
J111
Quartz | Level 8

All 3 replies to my question are good solutions

Thanks so much 🙂 

J111
Quartz | Level 8

Hello,

 

Next, if we wanted to add another field such as group by and

calculate the std for each value of gr in the data below:-

what would be the best solution ? Thanks  a lot.


data available ;
input gr dt y ;
cards ;
1 1 80
1 2 20
1 3 40
1 4 60
2 1 80
2 2 20
2 3 40
2 4 50
;
run ;

data wanted ;
input gr dt y std ;
cards ;
1 1 80
1 2 20 42.426406871
1 3 40 30.550504633
1 4 60 25.819888975
2 1 80
2 2 20 42.426406871
2 3 40 30.550504633
2 4 50 25
;
run ;

Ksharp
Super User
data available ;
input dt y ;
cards ;
1 80
2 20
3 40
4 60
;
data want;
 set available;
 array x{999999} _temporary_;
 x{_n_}=y;
 std=std(of x{*});
run;
FreelanceReinh
Jade | Level 19

Hello @J111,

 

You can also compute the cumulative standard deviation using Steiner's theorem:

data want(drop=_:);
set available;
retain _v 0;
_s+y; /* cumulative sum */
_m=_s/_n_; /* cumulative mean */
_d=dif(_m); /* mean change */
_q=(y-_m)**2; /* new term in sum of squares */
if _n_>1 then do;
  std=sqrt(_v+_d**2+_q/(_n_-1)); /* cumulative standard deviation */
  _v=((_n_-1)*(_v+_d**2)+_q)/_n_; /* cumulative population variance */
end;
run;

This calculation should take less than one second for a million observations.

SAS Innovate 2025: Call for Content

Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!

Submit your idea!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 6 replies
  • 383 views
  • 3 likes
  • 4 in conversation