BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
mtechnologist
Fluorite | Level 6

Hi All,

 

Most of you may know that in order to do PROC SURVEYLOGSITIC for survey data such as NHIS or NHANES you should have at least 3 important variables: strata, cluster (PSU), and sampling weight. My question is when a survey data does not have cluster variable, is it still appropriate to conduct SURVEYLOGISTIC or it will affect estimation of standard error? what other options that can be helpful to address this issue (bias from neglecting cluster variable in the analysis)?  

 

you support is highly appreciated.

 

thanks

1 ACCEPTED SOLUTION

Accepted Solutions
SAS_Rob
SAS Employee

Were you given replicate weights?  If so, then the cluster information is "built into" those weights and you can use the REPWEIGHTS statement.

 

If you are simply missing the cluster information then any results you report that ignore the clusters will be incorrect.

View solution in original post

5 REPLIES 5
ballardw
Super User

If your sample design did not include a cluster (or primary sampling unit) then you should not include a cluster variable.

 

If you examine the documentation for the Procedure you may well find an example title Stratified Cluster Sampling that, despite the name, does not use the Cluster statement in the Surveylogistic code as the Strata is the only "cluster".

 

 

mtechnologist
Fluorite | Level 6

Hi Ballardw,

 

I really appreciate your prompt response to my question. I agree with you on the example you provided but maybe my question was not clear enough. I am sure the data that I have has cluster variable that was not provided to me and simply I can not get it. Is it appropriate to just rely on strata and sampling weights in the analysis from statistical point of view? or the cluster variable will cause much bias if not used?

 

Thanks again.    

SAS_Rob
SAS Employee

Were you given replicate weights?  If so, then the cluster information is "built into" those weights and you can use the REPWEIGHTS statement.

 

If you are simply missing the cluster information then any results you report that ignore the clusters will be incorrect.

ballardw
Super User

@mtechnologist wrote:

Hi Ballardw,

 

I really appreciate your prompt response to my question. I agree with you on the example you provided but maybe my question was not clear enough. I am sure the data that I have has cluster variable that was not provided to me and simply I can not get it. Is it appropriate to just rely on strata and sampling weights in the analysis from statistical point of view? or the cluster variable will cause much bias if not used?

 

Thanks again.    


The highlighted above is very different from your original statement "My question is when a survey data does not have cluster variable, is it still appropriate to conduct SURVEYLOGISTIC or it will affect estimation of standard error?" This statement says there isn't any cluster, not that an expected variable is missing.

 

I might suggest going back to who did the sample design or the perhaps the original sample data set used to calculate the selection probability/weights. Hopefully there is sufficient information to link any cluster variable(s) from that data to your response data.

 

If not, go to your supervisor/ project manager/ who ever is in charge and suggest that they find where that information went because otherwise the data really is not going to provide defensible results.

mtechnologist
Fluorite | Level 6
Thank you both ballardw and Rob for answering my question.

sas-innovate-2024.png

Available on demand!

Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.

 

Register now!

What is ANOVA?

ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 5 replies
  • 658 views
  • 2 likes
  • 3 in conversation