turn on suggestions

Auto-suggest helps you quickly narrow down your search results by suggesting possible matches as you type.

Showing results for

Find a Community

- Home
- /
- Analytics
- /
- Stat Procs
- /
- NPAR1WAY, SURVEYSELECT and Kolmogorov-Smirnov two-...

Topic Options

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

08-23-2014 04:06 PM

In need to compare two distributions using NPAR1WAY and two-sample K-S tests,but one of them is weighted. If I set FREQ to the weight I get the correct cumulative distribution for the weighted data, but NPAR1WAY calculates the p-value incorrectly. It thinks the number of entries in the cumulative distribution is the sum of the weights, whereas it is much lower (thus the p-values are too low). Given the D-statistic, which I think SAS calculates correctly from the two cumulative distributions, I believe I can recalculate the p-value from the correct numbers of entries in the two distributions.

Is there a way to get NPAR1WAY to correctly calculate the p-value? Problem is I have to do this for 400 different pairs of distributions!

Can I somehow use SURVEYSELECT to resample the weighted distribution to get an unweighted distribution having the original number of observations? E.g. if the unweighted data set has 10K entries, and the sum of the weights is 200M, can SURVEYSELECT produce a data set with 10K entries that reproduces the weighted sample cumulative distribution?

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

08-24-2014 02:00 PM

Not sure a weighted K-S test exists. Is there a reference describing such a test? - PG

PG

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

08-24-2014 02:07 PM

The weights are essentially predicted frequencies, and I use them as such. They are based on a full survey design and using the weights/frequencies when plotting a variable should give a distribution that is close to what one would get if one sampled the entire US civilian population where every sample had weight equal to one.

The K-S test should work fine, I just want to find a way to get SAS to calculate the p-value correctly. It gets the d-statistic correct, I believe.