Help using Base SAS procedures

Weighted Standard Deviation/Mean - Proc Means

Accepted Solution Solved
Reply
Occasional Contributor
Posts: 7
Accepted Solution

Weighted Standard Deviation/Mean - Proc Means

I'm working on a productivity report for a group of 10 employees over a time period of 10 months.  For each of the 10 employees, I computed mean productivity (widgets produced per 8 hour day) for the 10 month period.  I ran a weighted proc means with the weight variable being the total number of widgets each employee produced over the 10 months or Sum_App variable) and the variable for the mean was the individuals mean productivity value for the 10 months(widgets divided by days worked or Over_Avg variable).  The weighted mean was 24.80 with a min of 6.11 and a max of 31.96, but the Standard Deviation is showing a value of 288.25 which seems very odd to me considering the range of scores 6.11 to 31.96.  Is this standard deviation something that shouldn't be used and is inaccurrate or could it really be correct?

 

This is the code I used:

 

proc means data=lwall ;

var Over_Avg;

weight Sum_App;

run;


Accepted Solutions
Solution
‎12-09-2016 04:10 PM
SAS Super FREQ
Posts: 3,753

Re: Weighted Standard Deviation/Mean - Proc Means

By default, the number of observations (actually N-1) is used for the denominator when computing a standard deviation.

With your application, I suspect a more reasonable computation would divide the weighted deviations by the sum of the weights.  You can do this by using the VARDEF= option. The documentation for PROC MEANS contains a discussion of what quantity each computation estimates.

 

proc means data=example VARDEF=WGT;
   title "With VARDEF=WGT";
   var x;
   weight w;
run; title;

View solution in original post


All Replies
Super User
Posts: 11,343

Re: Weighted Standard Deviation/Mean - Proc Means

Since standard deviation considers the number of observations a bit differently the likely large differences in the weights. Take a look at this and see if a light bulb pops:

 

data example;
   input x w;
datalines;
6  300
8  40
10 500
12 90
14 800
16 60
18 1000
20 40
22 700
24 80
;
run;

proc means data=example;
   title "With weights";
   var x;
   weight w;
run;  title;
proc means data=example;
   title "Without weights";
   var x;
run; title;
proc means data=example;
   title "With FREQ";
   var x;
   freq w;
run; title;
Occasional Contributor
Posts: 7

Re: Weighted Standard Deviation/Mean - Proc Means

No lightbulb - your code seems to show the same thing as my weighted proc means - Is this crazy high value standard deviation anything that can or should be used?  Even yours with a weighted mean of 15.64 and a weighted standard deviation value of 99.83 when the range is 6 to 24 seems inaccurrate or not to be used/trusted.

Super User
Posts: 11,343

Re: Weighted Standard Deviation/Mean - Proc Means

Did my example with FREQ make any sense? Look more like what you might expect?

 

If you look at the basic formula for standard deviation it is going to use n=10 but your data actually represents many more observations.

 

You don't show what your actual weight values look like but this note from the documentation might apply:

CAUTION:
Single extreme weight values can cause inaccurate results. 
When one (and only one) weight value is many orders of magnitude larger than the other weight values (for example, 49 weight values of 1 and one weight value of 1×1014), certain statistics might not be within acceptable accuracy limits. The affected statistics are based on the second moment (such as standard deviation, corrected sum of squares, variance, and standard error of the mean). Under certain circumstances, no warning is written to the SAS log.

and

 

If the values of your variable are counts that represent the number of occurrences of each observation, then use this variable in the FREQ statement rather than in the WEIGHT statement. In this case, because the values are counts, they should be integers.
Solution
‎12-09-2016 04:10 PM
SAS Super FREQ
Posts: 3,753

Re: Weighted Standard Deviation/Mean - Proc Means

By default, the number of observations (actually N-1) is used for the denominator when computing a standard deviation.

With your application, I suspect a more reasonable computation would divide the weighted deviations by the sum of the weights.  You can do this by using the VARDEF= option. The documentation for PROC MEANS contains a discussion of what quantity each computation estimates.

 

proc means data=example VARDEF=WGT;
   title "With VARDEF=WGT";
   var x;
   weight w;
run; title;
☑ This topic is solved.

Need further help from the community? Please ask a new question.

Discussion stats
  • 4 replies
  • 949 views
  • 0 likes
  • 3 in conversation