BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
Bildog1
Calcite | Level 5

I'm working on a productivity report for a group of 10 employees over a time period of 10 months.  For each of the 10 employees, I computed mean productivity (widgets produced per 8 hour day) for the 10 month period.  I ran a weighted proc means with the weight variable being the total number of widgets each employee produced over the 10 months or Sum_App variable) and the variable for the mean was the individuals mean productivity value for the 10 months(widgets divided by days worked or Over_Avg variable).  The weighted mean was 24.80 with a min of 6.11 and a max of 31.96, but the Standard Deviation is showing a value of 288.25 which seems very odd to me considering the range of scores 6.11 to 31.96.  Is this standard deviation something that shouldn't be used and is inaccurrate or could it really be correct?

 

This is the code I used:

 

proc means data=lwall ;

var Over_Avg;

weight Sum_App;

run;

1 ACCEPTED SOLUTION

Accepted Solutions
Rick_SAS
SAS Super FREQ

By default, the number of observations (actually N-1) is used for the denominator when computing a standard deviation.

With your application, I suspect a more reasonable computation would divide the weighted deviations by the sum of the weights.  You can do this by using the VARDEF= option. The documentation for PROC MEANS contains a discussion of what quantity each computation estimates.

 

proc means data=example VARDEF=WGT;
   title "With VARDEF=WGT";
   var x;
   weight w;
run; title;

View solution in original post

4 REPLIES 4
ballardw
Super User

Since standard deviation considers the number of observations a bit differently the likely large differences in the weights. Take a look at this and see if a light bulb pops:

 

data example;
   input x w;
datalines;
6  300
8  40
10 500
12 90
14 800
16 60
18 1000
20 40
22 700
24 80
;
run;

proc means data=example;
   title "With weights";
   var x;
   weight w;
run;  title;
proc means data=example;
   title "Without weights";
   var x;
run; title;
proc means data=example;
   title "With FREQ";
   var x;
   freq w;
run; title;
Bildog1
Calcite | Level 5

No lightbulb - your code seems to show the same thing as my weighted proc means - Is this crazy high value standard deviation anything that can or should be used?  Even yours with a weighted mean of 15.64 and a weighted standard deviation value of 99.83 when the range is 6 to 24 seems inaccurrate or not to be used/trusted.

ballardw
Super User

Did my example with FREQ make any sense? Look more like what you might expect?

 

If you look at the basic formula for standard deviation it is going to use n=10 but your data actually represents many more observations.

 

You don't show what your actual weight values look like but this note from the documentation might apply:

CAUTION:
Single extreme weight values can cause inaccurate results. 
When one (and only one) weight value is many orders of magnitude larger than the other weight values (for example, 49 weight values of 1 and one weight value of 1×1014), certain statistics might not be within acceptable accuracy limits. The affected statistics are based on the second moment (such as standard deviation, corrected sum of squares, variance, and standard error of the mean). Under certain circumstances, no warning is written to the SAS log.

and

 

If the values of your variable are counts that represent the number of occurrences of each observation, then use this variable in the FREQ statement rather than in the WEIGHT statement. In this case, because the values are counts, they should be integers.
Rick_SAS
SAS Super FREQ

By default, the number of observations (actually N-1) is used for the denominator when computing a standard deviation.

With your application, I suspect a more reasonable computation would divide the weighted deviations by the sum of the weights.  You can do this by using the VARDEF= option. The documentation for PROC MEANS contains a discussion of what quantity each computation estimates.

 

proc means data=example VARDEF=WGT;
   title "With VARDEF=WGT";
   var x;
   weight w;
run; title;

Ready to join fellow brilliant minds for the SAS Hackathon?

Build your skills. Make connections. Enjoy creative freedom. Maybe change the world. Registration is now open through August 30th. Visit the SAS Hackathon homepage.

Register today!
What is Bayesian Analysis?

Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 4 replies
  • 10758 views
  • 0 likes
  • 3 in conversation