I'm working on a productivity report for a group of 10 employees over a time period of 10 months. For each of the 10 employees, I computed mean productivity (widgets produced per 8 hour day) for the 10 month period. I ran a weighted proc means with the weight variable being the total number of widgets each employee produced over the 10 months or Sum_App variable) and the variable for the mean was the individuals mean productivity value for the 10 months(widgets divided by days worked or Over_Avg variable). The weighted mean was 24.80 with a min of 6.11 and a max of 31.96, but the Standard Deviation is showing a value of 288.25 which seems very odd to me considering the range of scores 6.11 to 31.96. Is this standard deviation something that shouldn't be used and is inaccurrate or could it really be correct?
This is the code I used:
proc means data=lwall ;
var Over_Avg;
weight Sum_App;
run;
By default, the number of observations (actually N-1) is used for the denominator when computing a standard deviation.
With your application, I suspect a more reasonable computation would divide the weighted deviations by the sum of the weights. You can do this by using the VARDEF= option. The documentation for PROC MEANS contains a discussion of what quantity each computation estimates.
proc means data=example VARDEF=WGT;
title "With VARDEF=WGT";
var x;
weight w;
run; title;
Since standard deviation considers the number of observations a bit differently the likely large differences in the weights. Take a look at this and see if a light bulb pops:
data example; input x w; datalines; 6 300 8 40 10 500 12 90 14 800 16 60 18 1000 20 40 22 700 24 80 ; run; proc means data=example; title "With weights"; var x; weight w; run; title; proc means data=example; title "Without weights"; var x; run; title; proc means data=example; title "With FREQ"; var x; freq w; run; title;
No lightbulb - your code seems to show the same thing as my weighted proc means - Is this crazy high value standard deviation anything that can or should be used? Even yours with a weighted mean of 15.64 and a weighted standard deviation value of 99.83 when the range is 6 to 24 seems inaccurrate or not to be used/trusted.
Did my example with FREQ make any sense? Look more like what you might expect?
If you look at the basic formula for standard deviation it is going to use n=10 but your data actually represents many more observations.
You don't show what your actual weight values look like but this note from the documentation might apply:
CAUTION: Single extreme weight values can cause inaccurate results. When one (and only one) weight value is many orders of magnitude larger than the other weight values (for example, 49 weight values of 1 and one weight value of 1×1014), certain statistics might not be within acceptable accuracy limits. The affected statistics are based on the second moment (such as standard deviation, corrected sum of squares, variance, and standard error of the mean). Under certain circumstances, no warning is written to the SAS log.
and
If the values of your variable are counts that represent the number of occurrences of each observation, then use this variable in the FREQ statement rather than in the WEIGHT statement. In this case, because the values are counts, they should be integers.
By default, the number of observations (actually N-1) is used for the denominator when computing a standard deviation.
With your application, I suspect a more reasonable computation would divide the weighted deviations by the sum of the weights. You can do this by using the VARDEF= option. The documentation for PROC MEANS contains a discussion of what quantity each computation estimates.
proc means data=example VARDEF=WGT;
title "With VARDEF=WGT";
var x;
weight w;
run; title;
Registration is open! SAS is returning to Vegas for an AI and analytics experience like no other! Whether you're an executive, manager, end user or SAS partner, SAS Innovate is designed for everyone on your team. Register for just $495 by 12/31/2023.
If you are interested in speaking, there is still time to submit a session idea. More details are posted on the website.
Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.
Find more tutorials on the SAS Users YouTube channel.