Re: Weighted frequencies, proc surveyfreq vs proc surveymeans

MillerEL · Posted 07-30-2021 04:12 PM

I have a question about the weighted frequencies between proc surveyfreq and proc surveymeans. I'm sure this seems super simple to you all out there, but I'm stuck.

I'm doing an analysis of a categorical variable from a complex survey. Someone has requested the weighted numerator and denominator. The problem is that I don't know which of the methods below is correct, or if I'm fundamentally misunderstanding something about the procedures and the Sum of Weights.

1. Proc SurveyMeans

The output for proc surveymeans does not provide weighted frequencies. One recommendation was to use Sum of Weights as the weighted denominator, and the Mean multiplied by the Sum of Weights as the weighted numerator. In the example below (for indicator=1):

Weighted numerator = .226788 * 1,060,305 = 240,464

Weighted denominator = 1,060,305

Prevalence = 22.68%

However, Proc SurveyFreq is the recommended procedure for categorical variables.

2. Proc SurveyFreq

The output for proc surveyfreq provides a Weighted Frequency, which does not match the numerator above (227,043 vs 240,464).

In Proc SurveyFreq (indicator=1), it would seem like the weight numerator should be 227,043 and the weighted denominator is1,001,125 (do I include the missings?).

But the Sum of Weights is the same as with Proc SurveyMeans. How can the weighted denominator be the Sum of Weights in one and not the other?

I've read that both procs handle missing data differently. However, I haven't seen a clear explanation of how those differences impact the interpretation or use of the procedures. Can anyone recommend a good reference that explains the differences (something at the beginner level)? Is that what I'm missing here? Can anyone help me understand what the weighted numerator and denominator should be, and why?

Thank you!

ballardw · Posted 07-30-2021 04:59 PM

Please show the code you are using to generate the output.

You may need to add options to request the values you want and the actual code used provides information about which approaches may be best for your code.

MillerEL · Posted 08-02-2021 10:12 AM

I have posted my code below. I included the SUMWGT suggested by Watts. I think Watts answered my question. If you have any other suggestions for options I should, or should not, be using. I would appreciate the input! Thank you!

/*VARIABLE 1*/
PROC SURVEYFREQ DATA=data1 NOMCAR;
STRATA strata;
CLUSTER cluster;
WEIGHT weight;
TABLES  var1 / ROW CL;
RUN;

VS
 
PROC SURVEYMEANS DATA=data1;
STRATA strata;
CLUSTER cluster;
WEIGHT weight;
VAR var1; 
CLASS var1;
RUN;

Watts · Posted 07-31-2021 12:26 PM

In PROC SURVEYMEANS, you can use the SUM option to display the weighted sum of the analysis variable in the "Statistics" table. And you can use the SUMWGT option to display the sum of the weights in the "Statistics" table. (Documentation is here.)

The difference you see is probably due to missing values in the analysis variable. The "Data Summary" table provides information about the entire input data set -- and doesn't exclude observations that have missing values for the analysis variable(s). But "Statistics" tables are computed separately for each analysis variable that you specify -- and each "Statistics" table excludes observations that have missing values for the analysis variable (unless you specify the MISSING option).

PROC SURVEYFREQ handles missing values in the same way. The frequency, crosstabulation, and statistics tables exclude observations that have missing values for the analysis variables (unless you specify the MISSING option). For details, see the documentation sections Missing Values (SURVEYFREQ) and Missing Values (SURVEYMEANS).

Weighted frequencies, proc surveyfreq vs proc surveymeans