11-19-2015 03:06 PM
I have a dataset with Dates, a variable that defines what the value measures, and the value. I want to get the year-to-date standard deviation for each date. So i have something like this, and want to create this Std column...
Date Measure value std
jan01 Batch Payment 93 .
jan08 Batch Payment 128 24.64
jan15 Batch Payment 126 19.58
jan01 Cost per Claim 311
jan08 Cost per Claim 385 18.38
jan15 Cost per Claim 281 16.29
jan01 Pended Claims 22372
jan08 Pended Claims 18389 2816.41
jan15 Pended Claims 15869 3278.81
I tried using this...
data testing; set test2; retain count; by measure; if first.measure then count=1; else count=1+count; array std_weeks _temporary_; std_weeks[count]=value; if first.measure then std = .; else std = std(of std_weeks[*]); run;
Right now I have 46 weeks of data, I plan to read the max of count into a variable and use that variable in place of the 46. What I have there works for the first group of measure, but the next measure's std includes values from the previous measure (since the array is set to 46). Can I make this work with arrays? Is there another way to achieve the desired result?
11-19-2015 03:11 PM
11-19-2015 03:34 PM
Here's another approach, just in case this appeals to you more.
If you wanted the mean instead of the standard deviation, you could just keep running SUM and running COUNT. Then on each observation divide to get the mean. Of course, you would have to re-set the running SUM and running COUNT each time you hit a new MEASURE ... but you're already doing that for the COUNT.
Similarly, standard deviation can be expressed as a function of the running SUM, running COUNT, and running SUM OF SQUARED VALUES. By applying that sort of a formula, you would never have to worry about how many array elements you need.
11-19-2015 04:32 PM
The sort of formula @Astounding is referring to would work like this:
data testing; set test2; by measure; if first.measure then do; n=0; sx=0; sx2=0; end; if not missing(value) then do; sx + value; sx2 + value*value; n + 1; if n>1 then std = sqrt((sx2-sx*sx/n)/(n-1)); end; drop n sx sx2; run;