Turn on suggestions

Auto-suggest helps you quickly narrow down your search results by suggesting possible matches as you type.

Showing results for

- Home
- /
- Analytics
- /
- Stat Procs
- /
- Subgroup Variance Estimation in complex surveys with 9.3

Options

- RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Mute
- Printer Friendly Page

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

Posted 02-19-2013 11:38 AM
(2020 views)

I am anlayzing data of a subgroup of complex, multistage survey: NHANES. In the analytic documentation on the NHANES' site, they describe that SAS 9.1 and 9.2 do not correctly calculate variance since they do not correctly calculate degrees of freedom. It goes on to explain that these versions of SAS to do not account for strata and PSUs with missing data. My question is whether SAS 9.3 has fixed this miscalulation or not.

I have included the NHANES analytic notes below if they are helpful. Thanks!!

Estimates are often calculated for various subgroups of interest within the total NHANES population. When the number of first stage sampling units (PSUs) is small, the z-statistic should be replaced by a value from a t-distribution when computing confidence limits for these estimates (see SUDAAN 1995 — ref from NHANES III analytic guidelines).

To calculate the correct value for the t-statistic from a t-distribution and a selected level of significance, you must calculate the proper degrees of freedom for the estimate .

In addition, it is important to examine the number of degrees of freedom from which a standard error estimate is based. Continuing research on issues related to stability of variance estimates in subdomains of NHANES have been published and show that standard error estimates based on small numbers of paired PSUs (i.e., degrees of freedom) are prone to instability.

The reliability of the estimated standard error, as measured by its relative standard error (i.e., (standard error of the standard error of the estimate/standard error of the estimate)*100), is inversely proportional to its degrees of freedom. As the number of degrees of freedom increases, the relative standard error decreases and the reliability of the estimate increases. The NHANES guidelines recommended a relative standard error of at most 30%. This corresponds to at least 12 degrees of freedom.

Degrees of freedom are properly calculated by subtracting the number of clusters in the first level of sampling (strata) from the number of clusters in the second level of sampling (PSUs) for each subgroup you are analyzing as shown the in equation below.

deg of freedom = # of PSUs - # of strata

For both SUDAAN and SAS Survey procedures, the degrees of freedom are calculated in the same way when looking at the entire sample population or in subgroups where all strata and PSUs are represented.

However, when you analyze data on a subgroup of sample persons who may not be represented in all strata and PSUs (e.g., Mexican Americans), the degrees of freedom provided in the output may differ. For example, SUDAAN will correctly count the number of PSU's and strata with at least one valid observation for each cell of the table being requested. In contrast, SAS 9.1 Survey procedures, such as *proc surveymeans*, compute the degrees of freedom as the number of clusters (PSUs) in the non-empty strata minus the number of non-empty strata. This means that if your data have empty strata (no persons in the population for either PSU) the number of degrees of freedom will increase. This is incorrect and SAS is currently working on correcting this problem. For more information on methods of correctly calculating degrees of freedom using SAS 9.1 Survey procedures, please see the following two SAS 9.1 Survey procedures macros.

4 REPLIES 4

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

Since you did not provide example output comparing SAS's PROC SURVEYxxx procedures with SUDAAN's comparable procedures,

I can suggest only that you read the corresponding SAS version 9.3 documentation on how its PROC SURVEYxxx procedures calculate degrees of freedom (for example, PROC SURVEYMEANS, pages 7430-7431):

http://support.sas.com/documentation/cdI/en/statug/63962/pdf/default/statug.pdf.

If the data have empty strata, won't the PSUs within those strata also be empty and also not be counted in the calculation of degrees of freedom? Thus, the number of degrees of freedom will not necessarily increase if the number of empty PSUs exceeds the number of empty strata including those empty PSUs. For example, the formula for degrees of freedom can equal either

1) DF = # PSUs - #strata, or

2) DF = # non-empty PSUs - # non-empty strata.

If there were 50 PSUs and 5 strata, DF according to formula # 1 would equal 50 -5 = 45 df.

If each stratum has on average 10 PSUs, and if one of these strata were empty, then the number of non-empty strata would equal 4, and the number of non-empty PSUs would equal 40 so that the DF according to formula #2 would equal 40 -4 = 36 df. Using formula #2 thus leads to an estimate of fewer DF than using formula #1, so that formula #2 is more statistically conservative.

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

SUDAAN uses # PSUs - # strata to calculate degrees of freedom for overall and sub-population estimates.

SUDAAN does not currently have a means of calculating the degrees of freedom using only those PSUs with at least 1 member of the sub-population. You can specify your own degrees of freedom in SUDAAN, however.

SAS 9.4 survey procedures have a switch/option to calculate # PSUs - #strata using only those PSUs with at least 1 member of the sub-population.

STATA uses # PSUs - #strata using only those PSUs with at least 1 member of the sub-population by default.

The issue with degrees of freedom and replicate-based variance estimation is more complex.

SAS and STATA use the number of replicates (or # of replicates -1 ) as the default degrees of freedom.

The R Survey package assumes an infinite number of degrees of freedom

Note that the default degrees of freedom for replicate-based variance estimation can be much higher than the corresponding degrees of freedom if using #PSUs - # strata among those PSUs with at least 1 member of the subpopulation.

You can specify your own degrees of freedom when using replicate-based variance estimation in all three software programs: SAS, STATA, and the R Survey package.

Are you ready for the spotlight? We're accepting content ideas for **SAS Innovate 2025** to be held May 6-9 in Orlando, FL. The call is **open **until September 16. Read more here about **why** you should contribute and **what is in it** for you!

What is ANOVA?

ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.

Find more tutorials on the SAS Users YouTube channel.