Turn on suggestions

Auto-suggest helps you quickly narrow down your search results by suggesting possible matches as you type.

Showing results for

- Home
- /
- Analytics
- /
- Stat Procs
- /
- Weighted frequencies, proc surveyfreq vs proc surveymeans

Options

- RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Mute
- Printer Friendly Page

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

Posted 07-30-2021 04:12 PM
(1550 views)

I have a question about the weighted frequencies between proc surveyfreq and proc surveymeans. I'm sure this seems super simple to you all out there, but I'm stuck.

I'm doing an analysis of a categorical variable from a complex survey. Someone has requested the weighted numerator and denominator. The problem is that I don't know which of the methods below is correct, or if I'm fundamentally misunderstanding something about the procedures and the *Sum of Weights.*

1. Proc SurveyMeans

The output for proc surveymeans does __not__ provide weighted frequencies. One recommendation was to use *Sum of Weights *as the weighted denominator, and the *Mean* multiplied by the *Sum of Weights* as the weighted numerator. In the example below (for indicator=1):

Weighted numerator = .226788 * 1,060,305 = 240,464

Weighted denominator = 1,060,305

Prevalence = 22.68%

However, Proc SurveyFreq is the recommended procedure for categorical variables.

2. Proc SurveyFreq

The output for proc surveyfreq provides a *Weighted Frequency, *which does __not__ match the numerator above (227,043 vs 240,464).

In Proc SurveyFreq (indicator=1), it would seem like the weight numerator should be 227,043 and the weighted denominator is1,001,125 (do I include the missings?).

But the *Sum of Weights *is the same as with Proc SurveyMeans. How can the weighted denominator be the *Sum of Weights *in one and not the other?

I've read that both procs handle missing data differently. However, I haven't seen a clear explanation of how those differences impact the interpretation or use of the procedures. Can anyone recommend a good reference that explains the differences (something at the beginner level)? Is that what I'm missing here? Can anyone help me understand what the weighted numerator and denominator should be, and why?

Thank you!

3 REPLIES 3

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

Please show the code you are using to generate the output.

You may need to add options to request the values you want and the actual code used provides information about which approaches may be best for your code.

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

I have posted my code below. I included the SUMWGT suggested by Watts. I think Watts answered my question. If you have any other suggestions for options I should, or should not, be using. I would appreciate the input! Thank you!

```
/*VARIABLE 1*/
PROC SURVEYFREQ DATA=data1 NOMCAR;
STRATA strata;
CLUSTER cluster;
WEIGHT weight;
TABLES var1 / ROW CL;
RUN;
VS
PROC SURVEYMEANS DATA=data1;
STRATA strata;
CLUSTER cluster;
WEIGHT weight;
VAR var1;
CLASS var1;
RUN;
```

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

In PROC SURVEYMEANS, you can use the SUM option to display the weighted sum of the analysis variable in the "Statistics" table. And you can use the SUMWGT option to display the sum of the weights in the "Statistics" table. (Documentation is here.)

The difference you see is probably due to missing values in the analysis variable. The "Data Summary" table provides information about the entire input data set -- and doesn't exclude observations that have missing values for the analysis variable(s). But "Statistics" tables are computed separately for each analysis variable that you specify -- and each "Statistics" table excludes observations that have missing values for the analysis variable (unless you specify the MISSING option).

PROC SURVEYFREQ handles missing values in the same way. The frequency, crosstabulation, and statistics tables exclude observations that have missing values for the analysis variables (unless you specify the MISSING option). For details, see the documentation sections Missing Values (SURVEYFREQ) and Missing Values (SURVEYMEANS).

Build your skills. Make connections. Enjoy creative freedom. Maybe change the world. **Registration is now open through August 30th**. Visit the SAS Hackathon homepage.

What is ANOVA?

ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.

Find more tutorials on the SAS Users YouTube channel.