BookmarkSubscribeRSS Feed
MillerEL
Obsidian | Level 7

I have a question about the weighted frequencies between proc surveyfreq and proc surveymeans. I'm sure this seems super simple to you all out there, but I'm stuck.

 

I'm doing an analysis of a categorical variable from a complex survey. Someone has requested the weighted numerator and denominator. The problem is that I don't know which of the methods below is correct, or if I'm fundamentally misunderstanding something about the procedures and the Sum of Weights.

 

1. Proc SurveyMeans

The output for proc surveymeans does not provide weighted frequencies. One recommendation was to use Sum of Weights as the weighted denominator, and the Mean multiplied by the Sum of Weights as the weighted numerator. In the example below (for indicator=1):

    Weighted numerator = .226788 * 1,060,305 = 240,464

    Weighted denominator = 1,060,305

    Prevalence = 22.68%

 

                   

         MillerEL_2-1627675711637.png

 

However, Proc SurveyFreq is the recommended procedure for categorical variables. 

 

2. Proc SurveyFreq

The output for proc surveyfreq provides a Weighted Frequency, which does not match the numerator above (227,043 vs 240,464). 

In Proc SurveyFreq (indicator=1), it would seem like the weight numerator should be 227,043 and the weighted denominator is1,001,125 (do I include the missings?). 

 

But the Sum of Weights is the same as with Proc SurveyMeans. How can the weighted denominator be the Sum of Weights in one and not the other?

 

             MillerEL_1-1627674439881.png

 

 

I've read that both procs handle missing data differently. However, I haven't seen a clear explanation of how those differences impact the interpretation or use of the procedures. Can anyone recommend a good reference that explains the differences (something at the beginner level)? Is that what I'm missing here? Can anyone help me understand what the weighted numerator and denominator should be, and why?

 

Thank you!

3 REPLIES 3
ballardw
Super User

Please show the code you are using to generate the output.

 

You may need to add options to request the values you want and the actual code used provides information about which approaches may be best for your code.

MillerEL
Obsidian | Level 7

I have posted my code below. I included the SUMWGT suggested by Watts. I think Watts answered my question. If you have any other suggestions for options I should, or should not, be using. I would appreciate the input! Thank you!

 

/*VARIABLE 1*/
PROC SURVEYFREQ DATA=data1 NOMCAR;
STRATA strata;
CLUSTER cluster;
WEIGHT weight;
TABLES  var1 / ROW CL;
RUN;

VS
 
PROC SURVEYMEANS DATA=data1;
STRATA strata;
CLUSTER cluster;
WEIGHT weight;
VAR var1; 
CLASS var1;
RUN;      
Watts
SAS Employee

In PROC SURVEYMEANS, you can use the SUM option to display the weighted sum of the analysis variable in the "Statistics" table. And you can use the SUMWGT option to display the sum of the weights in the "Statistics" table. (Documentation is here.) 

 

The difference you see is probably due to missing values in the analysis variable. The "Data Summary" table provides information about the entire input data set -- and doesn't exclude observations that have missing values for the analysis variable(s). But "Statistics" tables are computed separately for each analysis variable that you specify -- and each "Statistics" table excludes observations that have missing values for the analysis variable (unless you specify the MISSING option).

 

PROC SURVEYFREQ handles missing values in the same way. The frequency, crosstabulation, and statistics tables exclude observations that have missing values for the analysis variables (unless you specify the MISSING option). For details, see the documentation sections Missing Values (SURVEYFREQ) and Missing Values (SURVEYMEANS).

 

 

 

 

SAS Innovate 2025: Save the Date

 SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!

Save the date!

What is ANOVA?

ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 3 replies
  • 1871 views
  • 1 like
  • 3 in conversation