year to date Standard Deviation

New Contributor
Posts: 3

year to date Standard Deviation

I have a dataset with Dates, a variable that defines what the value measures, and the value.  I want to get the year-to-date standard deviation for each date.  So i have something like this, and want to create this Std column...

Date           Measure                    value           std

jan01          Batch Payment          93                 .

jan08          Batch Payment          128             24.64

jan15          Batch Payment          126             19.58

jan01          Cost per Claim           311            

jan08          Cost per Claim           385             18.38

jan15          Cost per Claim           281             16.29

jan01          Pended Claims          22372

jan08          Pended Claims          18389          2816.41

jan15          Pended Claims          15869          3278.81

I tried using this...

data testing;
set test2;

retain count;
by measure;

if first.measure then count=1;
else count=1+count;

array std_weeks[46] _temporary_;


if first.measure then std = .;
else std = std(of std_weeks[*]);


Right now I have 46 weeks of data, I plan to read the max of count into a variable and use that variable in place of the 46.  What I have there works for the first group of measure, but the next measure's std includes values from the previous measure (since the array is set to 46).  Can I make this work with arrays?  Is there another way to achieve the desired result?




Super User
Posts: 17,828

Re: year to date Standard Deviation

You need to reset your array when the measure changes.

See code here that works for other moving metrics:
Super User
Posts: 5,082

Re: year to date Standard Deviation

Here's another approach, just in case this appeals to you more.


If you wanted the mean instead of the standard deviation, you could just keep running SUM and running COUNT.  Then on each observation divide to get the mean.  Of course, you would have to re-set the running SUM and running COUNT each time you hit a new MEASURE ... but you're already doing that for the COUNT.


Similarly, standard deviation can be expressed as a function of the running SUM, running COUNT, and running SUM OF SQUARED VALUES.  By applying that sort of a formula, you would never have to worry about how many array elements you need.


Good luck.

Respected Advisor
Posts: 4,647

Re: year to date Standard Deviation

The sort of formula @Astounding is referring to would work like this:


data testing;
set test2; by measure;
if first.measure then do; n=0; sx=0; sx2=0; end;
if not missing(value) then do;
    sx + value;
    sx2 + value*value;
    n + 1;
    if n>1 then std = sqrt((sx2-sx*sx/n)/(n-1));
drop n sx sx2;
Ask a Question
Discussion stats
  • 3 replies
  • 4 in conversation