Hello, I am trying to recreate an analysis of a complex survey dataset in SAS that was previously analyzed in R and Stata. While the point estimates are correct, the standard errors and confidence limits are close, but not exact. In looking for possible explanations for this, I realized I was not able to apply a setting for certainty primary sampling units (PSUs) that was included in the R and Stata code. Is there a way to set up the survey design in SAS that would be the equivalent of R's
options(survey.lonely.psu = "certainty")
or the "singleunit(certainty)" setting at the end of the line below in Stata?
svyset caseid [pweight=wt_a], strata(strata) vce(linearized) singleunit(certainty)
I came across the CERTSIZE parameter in PROC SURVEYSELECT, but I believe I would be looking for a setting that would apply to PROC SURVEYMEANS and PROC SURVEYFREQ.
For reference, I am not an experienced survey data analyst, but have been a SAS user for years. Any help would be appreciated, thanks!
Hi there.
I am wondering if you ever figured out this problem? I'm also working w/complex survey data and need to figure out how to account for a single primary sampling unit (PSU). The corresponding R code would be "options(survey.lonely.psu="adjust") or singleunit(centered) in Stata. Would need to be able to use this feature in proc surveyfreq or proc surveylogistic.
Any help would be appreciated, thanks!
I don't know of a SAS option that will support what you want.
You can exclude the certainty PSUs from your data set and re-run your code. Your point estimates won't be what you want but standard errors/variances will be. Combine the point estimates from your run of the code with all PSUs with the variance estimates from your run of the code that excludes the certainty PSUs. You'll need to manually create CIs and such; including hypothesis testing.
I'm assuming here that you have a single-stage design. If you have two or more stages of sampling then you should be accounting for variability within PSUs, even certainty PSUs, arising due to the sampling within PSUs.
p.s. this approach really is only for totals and linear functions of totals. It's approximate for non-linear functions of totals. If you have statistics which aren't functions of totals (such as implicitly defined estimates associated with, for example, regression estimates) then this method isn't necessarily going to be what you want.
Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!
ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.
Find more tutorials on the SAS Users YouTube channel.