Help using Base SAS procedures

Weighted standard deviation using proc means

Reply
N/A
Posts: 1

Weighted standard deviation using proc means

I am running a means on weighted data in both SAS and Stata and getting wildly different values for standard deviation.  The statistician here believes SAS is incorrect.

I took the class dataset from sashelp and created two fake weights.  t_wt gives everyone a weight of 1 and t_wt2 gives everyone a weight of 5.  When running means using each weight I had expected the standard deviation to remain the same as there is no variance in the means or data distribution.  (and in Stata, the standard deviation does remain the same).  However I am getting a shift in stddev from 22.77 to 50.92 for the weight variable and 5.12 to 11.46 for the height variable.  We are having problems explaining why the results are different in SAS and Stata.  Any thoughts?

data temp2;

  set sashelp.class;

  t_wt = 1;

  t_wt2 = 5;

run;

proc means data = temp2 mean min max std n std;

  var weight height;

  weight t_wt;

run;

proc means data = temp2 mean min max std n std;

  var weight height;

  weight t_wt2;

run;

Super User
Posts: 17,868

Re: Weighted standard deviation using proc means

I think there's a note regarding this in the documentation.

Try using proc surveymeans instead.

EDIT: Look at the VARDEF= Options instead, which is the denominator for the variance/std calculation. The default is probably not what you want, most likely WGT or N instead.

data temp2;

  set sashelp.class;

  t_wt = 1;

  t_wt2 = 5;

run;

proc means data = temp2 mean min max std n std vardef=WGT;

  var weight height;

  weight t_wt;

run;

proc means data = temp2 mean min max std n std vardef=WGT;

  var weight height;

  weight t_wt2;

run;

Valued Guide
Valued Guide
Posts: 684

Re: Weighted standard deviation using proc means

PROC MEANS calculates the variance as the sum[weight*(x-xbar)^2]/d, where d can be different things. The default is d=n-1. Thus, you will get a very different variance and hence standard deviation by changing the weight from 1 to 5 (no adjustment for the magnitude of the weights).  You can adjust for the scale difference by using the statement option VARDEF=WEIGHT. Then, d = sum[weight]. Try:

proc means data = temp2 mean min max std n std VARDEF=WGT;

  var weight height;

  weight t_wt2;

run;

This will get you close to the same variance and standard deviation as the original. You could also try VARDEF=WEIGHT to get d=sum[weight] - 1.

Ask a Question
Discussion stats
  • 2 replies
  • 1460 views
  • 0 likes
  • 3 in conversation