BookmarkSubscribeRSS Feed
ewolin
Calcite | Level 5

I need to resample from a data set that has clusters, strata and weights.  The survey design is such that, properly analyzed, results should be representative of a much larger population (the weights can be quite large).  I want to resample the data to yield a representative sample of the larger population where each item has weight 1.  That is, if there are 10,000 weighted entries in the original sample (in strata and clusters), each with average weight W, thus representing 10,000*W people, I want a new random resample of (say) 20,000 entries each with weight 1 and with no strata or clusters.

Can this be done in SAS?

4 REPLIES 4
ballardw
Super User

I think that if you use your current weight variable as a FREQ variable that will make the sample frame the 10000*W though you may need to round the existing weight variable to an integer (not sure). The output dataset will have a weight but you can reset that to 1 OR basically in any further analysis do not include any weight variable as the default will be to treat each record as having a weight of 1.

You can either list the variables to keep in an ID statement or drop the strata and clusters from other analysis. My feeling though would be to leave the variables in the data just in case some wants to see analysis on at least the strata variables.

ewolin
Calcite | Level 5

Yes, using FREQ allows me to get correct distributions.  The problem is that NPAR1WAY doesn't do two-sample Kolmogorov-Smirnov or Wilcoxon Rank Sum tests for weighted data.  I'm trying to find a way to trick it by converting a small sample of weighted data into very large sample of unweighted data, then resampling back to its original size so that NPAR1WAY will give correct results.  I know of no other way of getting SAS to calculate these statistics correctly.

So I believe I really need to resample from a weighted, complex survey sample down to a simple unweighted survey sample.

ballardw
Super User

But since NPAR1WAY does allow use of a FREQ variable, I would try using your weight variable for FREQ in the procedure and not subset the data.

ewolin
Calcite | Level 5

Yes, I did that, and realized NPAR1WAY was calculating things incorrectly because the frequency is not the same as the weight, and all it takes is the frequency (and I have a weight).

E.g. the Kolmogorov-Smirnov p-value is always near zero because it uses the sum of the frequencies in the calculation, but the statistical precision goes as the number of entries (i.e. as 10,000, not as 10,000*W).  Same for Wilcoxon Rank Sum tests, using the weight as a frequency NPAR1WAY thinks there are 10,000*W people and thus the p-values are always miniscule.  For the K-S tests I can in principle recalculate p from D, n1 and n2.  But I cannot do anything about the Wilcoxon calculations. they are wrong and I can't recalculate them.

Thus the need to create the new unweighted, resampled data set from the original weighted set.

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

What is ANOVA?

ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 4 replies
  • 1688 views
  • 0 likes
  • 2 in conversation