I'm trying to grasp the functional differences between the way SAS calculates its confidence intervals with PROC Surveymeans and a traditional Z-score-based calculation:
SAS code used to produce these estimates and the CI:
proc surveymeans data = DATA nobs sum clsum varsum cvsum nosparse;
class DUM;
cluster psu;
strata stratum;
weight wt;
var DUM;
run;
My questions:
Thank you!
The confidence intervals for the mean and total uses a t-distribution. These two sections of the documentation contain all the relevant calculations including the DF.
The confidence intervals for the mean and total uses a t-distribution. These two sections of the documentation contain all the relevant calculations including the DF.
Part of the theoretical justification of the adjustment of confidence intervals in the presence of adjustment of estimated variances or standard errors in complex survey data analysis is the exceedingly low degrees of freedom used for building confidence intervals compared to the one used for construction of such interval under a simple random sampling setting. In fact, the degrees of freedom used for such interval estimation is usually a "rule of thumb degrees of freedom" and equals to the difference between the unique number of primary sampling units and the number of strata. In most sampling cases, this number is usually dramatically lower than the sample size minus one, which is the degrees of freedom used for building confidence intervals in a simple random sampling setting.
Why do we employ such a small degrees of freedom for complex surveys? I think part of the reason lies in the shrunken effective sample size caused by the correlation among survey subjects. To understand the concept of effective sample size, I would like to modify an example provided in Amazon.com: Applied Survey Data Analysis (Chapman & Hall/CRC Statistics in the Social and Behavioral.... Suppose you are a researcher interested in the distribution of age and sex of the English teachers of a school. You entire a classroom with 50 students and ask them questions on their teachers' age and sex. Given that the students actually share an English teacher and their results are correlated, you actually receive the same answer among them. Therefore, if you ask your question one by one, you do not have information on 50 teachers, you only have information on one. In other words, the effective sample size is 1 instead of 50.
From a statistical perspective, the shrinkage in effective sample size is attributable to the correlation among the observations. In a simple random sampling setting, the assumption that subjects are independent of each other is tenable, so the information you get from one subject is completely different from that of the other subject. You really get something you new every time you obtain one sample subject. But once the observations are correlated, as in complex surveys, you actually receive less information from a new respondent than you do when they are independent of each other, an assumption you can safely make in a simple random sampling setting.
Nearly 200 sessions are now available on demand with the SAS Innovate Digital Pass.
Explore Now →ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.
Find more tutorials on the SAS Users YouTube channel.