BookmarkSubscribeRSS Feed
catherinesjy
Calcite | Level 5

I am trying to do an analysis on a Complex Survey dataset. I am wondering if there is any way to convert a Complex Survey dataset to a regular dataset by the survey weights and re-sampling or other techniques. I’ve searched around and only found that the SURVEYSELECT procedure might be related but it doesn’t work exactly. For example a complex survey dataset with sample size 10000 might become sample size 100000000 after conversion; however, the SURVEYSELECT proc has a cap of sample size equals the original sample size, i.e., 10000.

To sum up, I just wanted to find a way to convert a Complex Survey dataset to a regular dataset. Look forward to any suggestions or help!

6 REPLIES 6
ballardw
Super User

First, "complex survey data" is a regular SAS data set. The elements that make it complex relate to the sample design.

 

What exactly do you expect to do with the "regular data"? The main things in the survey procedures relate to calculation and use of variability and confidence intervals. If you do not expect to do anything related to confidence intervals or aren't going to use the weights then nothing needs to be done. If the analysis involves variability then you want the appropriate survey procedure.

 

Surveyselect would be used to create a sample file to select respondents for a survey and provide selection weight values. So if you have post collection data is almost certainly not of use.

catherinesjy
Calcite | Level 5

Thank you for your response. And let me clarify my goal. 

 

I need to use complex survey weights (e.g., strata, cluster, weight) in my analysis so as to avoid bias. However, a lot of machine learning functions do not support complex survey design, so I am looking for a way to convert the complex survey dataset into a regular dataset according to the complex survey design (i.e., weights). In this case, we could directly use regular methods but just input the converted dataset for further analysis. Did I make sense?

 

Look forward to further help. Thanks!

Tom
Super User Tom
Super User

Talk to a statistician.  I don't think that would eliminate the issue with biased variance estimates.

SteveDenham
Jade | Level 19

Give @Tom the bonus prize. I can't think of an analytic method using weights that would give exactly the same results as the SURVEY procs. Point estimates or slope estimates, yes. Variance estimates - no.  I would be glad to be proved wrong. Give us an idea of what the "regular" ML methods you want to use, and maybe someone (even me) could tell you why a survey weighted regular analysis is not as good as being clever about using the various SURVEY procedures.

 

SteveDenham

catherinesjy
Calcite | Level 5

Thank you for your reply! To answer your question, the regular ML method I meant decision tree or random forest kind of things. I know it's not necessary to use weights for these tree-based methods as they don't use estimated variance but just curious about if there's any way to incorporate weights so that the results are more representative. And the simplest way I could think of was to convert a weighted dataset to a regular one but looks like it's not realistic.

ballardw
Super User

@catherinesjy wrote:

Thank you for your response. And let me clarify my goal. 

 

I need to use complex survey weights (e.g., strata, cluster, weight) in my analysis so as to avoid bias. However, a lot of machine learning functions do not support complex survey design, so I am looking for a way to convert the complex survey dataset into a regular dataset according to the complex survey design (i.e., weights). In this case, we could directly use regular methods but just input the converted dataset for further analysis. Did I make sense?

 

Look forward to further help. Thanks!


You would have to provide all the restrictions and requirements of the data content for the "machine learning functions" you intend to use to see if a data set can be made to look like what it expects. And quite possibly a moderately complete example of the survey sample creation and weighting process to see if the two can be made to work "or close enough" where someone can determine how close the errors might be.

 

"Doesn't support" does not describe what is required by the other process.

hackathon24-white-horiz.png

The 2025 SAS Hackathon has begun!

It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.

Latest Updates

What is ANOVA?

ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 6 replies
  • 1721 views
  • 1 like
  • 4 in conversation