08-25-2014 01:27 PM
Is there a function I can call directly that will calculate the p-value given the K-S d-statistic and the number of entries in the two distributions? I need the function that NPAR1WAY uses behind the scenes. The problem is that I can't give NPAR1WAY what it needs (unweighted data, mine is weighted) so I use NPAR1WAY to get the d-statistic and I need to calculate the p-value myself. I could scale up the reported p-value except for my weighted data NPAR1WAY always says p<0.0001, i.e. it doesn't return a value I can scale!
I need something like: myPvalue = KS_pValue(dStatistic,n1,n2);
Ranges for n1,n2 in my case are: 1000 < n1 < 100,000, 50 < n2 < 2000.
08-25-2014 01:42 PM
You can get the actual p-value in the output data set. The value you are seeing is the formatted value using PVALUE format.
08-25-2014 02:44 PM
I looked in the output data, all the p-values are 0. The problem is my weights are huge, as high as 200,000. But the true p-value calculated properly should not be 0.
Might EXACT help? I fiddled with EXACT, MC, MAXTIME, etc. but it just sat there for long periods of time and I had to kill it. Have to try N= as well.
Anyway, if I could just access the function NPAR1WAY uses I'd be fine...
08-25-2014 03:26 PM
The help function in SAS can take you to the details of the KS test. Or just check out the web page
New versions have the same information.
With the samples sizes you have, even the smallest difference will be significant.
08-25-2014 03:32 PM
Yes, I saw the equations, I was hoping a function exists to do the work. I understand in practice no one sums from zero to infinity, only a few terms are kept and special corrections are applied, so I'd much rather find a working function written by experts than have to research this on my own.
08-25-2014 03:50 PM
Part of your message looks like you are concerned about getting p<0.0001, instead of the actual (small value), say p = 1.67*10^(-8). You can get a more exact printout by storing the relevant statistics with an ODS output statement, and then printing the stored file. Here is a simple example (without frequencies) where the printout gives < .0001. The last column of the KS2 file has two rows. First row is D and second row is p (in scientific notation). Variable is called nValue2 (for some strange reason).
do group = 1 to 2;
do rep = 1 to 10000;
y = group*.1 + rannor(1);
proc npar1way data=a edf ks;
ods output KolSmir2Stats=ks2 ;
proc print data=ks2;run;
08-25-2014 04:37 PM
I tried that previously, it just printed exactly 0 for all the p-values (see column of zeros below). Perhaps my weights are so large that it's hopeless to get a non-zero p-value that I can correct for the weights. Seems like I'll just have to calculate the p-value myself.
I sure would like to find a function in SAS that does the calculation...
08-25-2014 04:55 PM
Not my area. But any formula other than the exact one will be an approximation.I doubt if the p value will scale linearly with n, so there would be no simple upscaling.