BookmarkSubscribeRSS Feed
rkgrc000
Obsidian | Level 7

Hello, I am trying to recreate an analysis of a complex survey dataset in SAS that was previously analyzed in R and Stata.  While the point estimates are correct, the standard errors and confidence limits are close, but not exact.  In looking for possible explanations for this, I realized I was not able to apply a setting for certainty primary sampling units (PSUs) that was included in the R and Stata code.  Is there a way to set up the survey design in SAS that would be the equivalent of R's

options(survey.lonely.psu = "certainty")

or the "singleunit(certainty)" setting at the end of the line below in Stata?

svyset caseid [pweight=wt_a], strata(strata) vce(linearized) singleunit(certainty)

I came across the CERTSIZE parameter in PROC SURVEYSELECT, but I believe I would be looking for a setting that would apply to PROC SURVEYMEANS and PROC SURVEYFREQ.

 

For reference, I am not an experienced survey data analyst, but have been a SAS user for years.  Any help would be appreciated, thanks!

2 REPLIES 2
NCANT033
Calcite | Level 5

Hi there.

I am wondering if you ever figured out this problem? I'm also working w/complex survey data and need to figure out how to account for a single primary sampling unit (PSU). The corresponding R code would be "options(survey.lonely.psu="adjust") or singleunit(centered) in Stata. Would need to be able to use this feature in proc surveyfreq or proc surveylogistic.

Any help would be appreciated, thanks!

DWilson
Pyrite | Level 9

I don't know of a SAS option that will support what you want.

 

You can exclude the certainty PSUs from your data set and re-run your code. Your point estimates won't be what you want but standard errors/variances will be. Combine the point estimates from your run of the code with all PSUs with the variance estimates from your run of the code that excludes the certainty PSUs.  You'll need to manually create CIs and such; including hypothesis testing.

 

I'm assuming here that you have a single-stage design. If you have two or more stages of sampling then you should be accounting for variability within PSUs, even certainty PSUs, arising due to the sampling within PSUs.

 

p.s. this approach really is only for totals and linear functions of totals. It's approximate for non-linear functions of totals. If you have statistics which aren't functions of totals (such as implicitly defined estimates associated with, for example, regression estimates) then this method isn't necessarily going to be what you want.

 

sas-innovate-2024.png

Don't miss out on SAS Innovate - Register now for the FREE Livestream!

Can't make it to Vegas? No problem! Watch our general sessions LIVE or on-demand starting April 17th. Hear from SAS execs, best-selling author Adam Grant, Hot Ones host Sean Evans, top tech journalist Kara Swisher, AI expert Cassie Kozyrkov, and the mind-blowing dance crew iLuminate! Plus, get access to over 20 breakout sessions.

 

Register now!

What is ANOVA?

ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 2 replies
  • 1918 views
  • 1 like
  • 3 in conversation