BookmarkSubscribeRSS Feed
ewolin
Calcite | Level 5

In need to compare two distributions using NPAR1WAY and two-sample K-S tests,but one of them is weighted.  If I set FREQ to the weight I get the correct cumulative distribution for the weighted data, but NPAR1WAY calculates the p-value incorrectly.  It thinks the number of entries in the cumulative distribution is the sum of the weights, whereas it is much lower (thus the p-values are too low).  Given the D-statistic, which I think SAS calculates correctly from the two cumulative distributions, I believe I can recalculate the p-value from the correct numbers of entries in the two distributions.

Is there a way to get NPAR1WAY to correctly calculate the p-value?  Problem is I have to do this for 400 different pairs of distributions!

Can I somehow use SURVEYSELECT to resample the weighted distribution to get an unweighted distribution having the original number of observations?  E.g. if the unweighted data set has 10K entries, and the sum of the weights is 200M, can SURVEYSELECT produce a data set with 10K entries that reproduces the weighted sample cumulative distribution?

2 REPLIES 2
PGStats
Opal | Level 21

Not sure a weighted K-S test exists. Is there a reference describing such a test? - PG

PG
ewolin
Calcite | Level 5

The weights are essentially predicted frequencies, and I use them as such.  They are based on a full survey design and using the weights/frequencies when plotting a variable should give a distribution that is close to what one would get if one sampled the entire US civilian population where every sample had weight equal to one.

The K-S test should work fine, I just want to find a way to get SAS to calculate the p-value correctly.  It gets the d-statistic correct, I believe.

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

What is ANOVA?

ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 2 replies
  • 2016 views
  • 0 likes
  • 2 in conversation