BookmarkSubscribeRSS Feed
☑ This topic is solved. Need further help from the community? Please sign in and ask a new question.
1900251
Calcite | Level 5

I'm trying to grasp the functional differences between the way SAS calculates its confidence intervals with PROC Surveymeans and a traditional Z-score-based calculation:

 

  • PROC Surveymeans produces a wider interval for my data than calculating the interval using the estimated mean, standard error, and Z-score (e.g., ~1.96 for 95%).
    • For reference: image.png
    • expected 95% CI [44592, 69542]
    • To get the CIs produced by PROC Surveymeans with the Z score method (for this dataset), you would need to use something closer to Z=1.99 rather than 1.96.
  • I found this paper that suggests an adjustment is being applied to CIs produced by PROC Surveymeans to account for unequal weights in the data: https://support.sas.com/resources/papers/proceedings17/0970-2017.pdf. However, the method of adjustment is not detailed therein. 

SAS code used to produce these estimates and the CI:

proc surveymeans data = DATA nobs sum clsum varsum cvsum nosparse;
class DUM;
cluster psu;
strata stratum;
weight wt;
var DUM;
run;

 

My questions:

  1. Is there information on the correction applied that would cause this difference in CI calculation? The formula used to calculate the bounds of the interval would be ideal.
  2. What is the theory/justification behind applying further adjustment to a confidence interval calculation when the error estimates already account for survey design/weights?

Thank you!

1 ACCEPTED SOLUTION

Accepted Solutions
SAS_Rob
SAS Employee

The confidence intervals for the mean and total uses a t-distribution.  These two sections of the documentation contain all the relevant calculations including the DF.

SAS Help Center: Statistical Computations

SAS Help Center: Statistical Computations

View solution in original post

2 REPLIES 2
SAS_Rob
SAS Employee

The confidence intervals for the mean and total uses a t-distribution.  These two sections of the documentation contain all the relevant calculations including the DF.

SAS Help Center: Statistical Computations

SAS Help Center: Statistical Computations

Season
Barite | Level 11

Part of the theoretical justification of the adjustment of confidence intervals in the presence of adjustment of estimated variances or standard errors in complex survey data analysis is the exceedingly low degrees of freedom used for building confidence intervals compared to the one used for construction of such interval under a simple random sampling setting. In fact, the degrees of freedom used for such interval estimation is usually a "rule of thumb degrees of freedom" and equals to the difference between the unique number of primary sampling units and the number of strata. In most sampling cases, this number is usually dramatically lower than the sample size minus one, which is the degrees of freedom used for building confidence intervals in a simple random sampling setting.

Why do we employ such a small degrees of freedom for complex surveys? I think part of the reason lies in the shrunken effective sample size caused by the correlation among survey subjects. To understand the concept of effective sample size, I would like to modify an example provided in Amazon.com: Applied Survey Data Analysis (Chapman & Hall/CRC Statistics in the Social and Behavioral.... Suppose you are a researcher interested in the distribution of age and sex of the English teachers of a school. You entire a classroom with 50 students and ask them questions on their teachers' age and sex. Given that the students actually share an English teacher and their results are correlated, you actually receive the same answer among them. Therefore, if you ask your question one by one, you do not have information on 50 teachers, you only have information on one. In other words, the effective sample size is 1 instead of 50.

From a statistical perspective, the shrinkage in effective sample size is attributable to the correlation among the observations. In a simple random sampling setting, the assumption that subjects are independent of each other is tenable, so the information you get from one subject is completely different from that of the other subject. You really get something you new every time you obtain one sample subject. But once the observations are correlated, as in complex surveys, you actually receive less information from a new respondent than you do when they are independent of each other, an assumption you can safely make in a simple random sampling setting.

Catch up on SAS Innovate 2026

Nearly 200 sessions are now available on demand with the SAS Innovate Digital Pass.

Explore Now →
What is ANOVA?

ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 2 replies
  • 2074 views
  • 3 likes
  • 3 in conversation