BookmarkSubscribeRSS Feed
MMcCracken
Calcite | Level 5

I am running a means on weighted data in both SAS and Stata and getting wildly different values for standard deviation.  The statistician here believes SAS is incorrect.

I took the class dataset from sashelp and created two fake weights.  t_wt gives everyone a weight of 1 and t_wt2 gives everyone a weight of 5.  When running means using each weight I had expected the standard deviation to remain the same as there is no variance in the means or data distribution.  (and in Stata, the standard deviation does remain the same).  However I am getting a shift in stddev from 22.77 to 50.92 for the weight variable and 5.12 to 11.46 for the height variable.  We are having problems explaining why the results are different in SAS and Stata.  Any thoughts?

data temp2;

  set sashelp.class;

  t_wt = 1;

  t_wt2 = 5;

run;

proc means data = temp2 mean min max std n std;

  var weight height;

  weight t_wt;

run;

proc means data = temp2 mean min max std n std;

  var weight height;

  weight t_wt2;

run;

2 REPLIES 2
Reeza
Super User

I think there's a note regarding this in the documentation.

Try using proc surveymeans instead.

EDIT: Look at the VARDEF= Options instead, which is the denominator for the variance/std calculation. The default is probably not what you want, most likely WGT or N instead.

data temp2;

  set sashelp.class;

  t_wt = 1;

  t_wt2 = 5;

run;

proc means data = temp2 mean min max std n std vardef=WGT;

  var weight height;

  weight t_wt;

run;

proc means data = temp2 mean min max std n std vardef=WGT;

  var weight height;

  weight t_wt2;

run;

lvm
Rhodochrosite | Level 12 lvm
Rhodochrosite | Level 12

PROC MEANS calculates the variance as the sum[weight*(x-xbar)^2]/d, where d can be different things. The default is d=n-1. Thus, you will get a very different variance and hence standard deviation by changing the weight from 1 to 5 (no adjustment for the magnitude of the weights).  You can adjust for the scale difference by using the statement option VARDEF=WEIGHT. Then, d = sum[weight]. Try:

proc means data = temp2 mean min max std n std VARDEF=WGT;

  var weight height;

  weight t_wt2;

run;

This will get you close to the same variance and standard deviation as the original. You could also try VARDEF=WEIGHT to get d=sum[weight] - 1.

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

What is Bayesian Analysis?

Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 2 replies
  • 5417 views
  • 0 likes
  • 3 in conversation