## Calculating p-value for two-sample Kolmogorov-Smirnov test

# Calculating p-value for two-sample Kolmogorov-Smirnov test

Is there a function I can call directly that will calculate the p-value given the K-S d-statistic and the number of entries in the two distributions?  I need the function that NPAR1WAY uses behind the scenes.  The problem is that I can't give NPAR1WAY what it needs (unweighted data, mine is weighted) so I use NPAR1WAY to get the d-statistic and I need to calculate the p-value myself.  I could scale up the reported p-value except for my weighted data NPAR1WAY always says p<0.0001, i.e. it doesn't return a value I can scale!

I need something like:   myPvalue = KS_pValue(dStatistic,n1,n2);

Ranges for n1,n2 in my case are:  1000 < n1 < 100,000, 50 < n2 < 2000.

## Re: Calculating p-value for two-sample Kolmogorov-Smirnov test

You can get the actual p-value in the output data set.  The value you are seeing is the formatted value using PVALUE format.

data Arthritis;
input Treatment \$ Response Freq @@;
datalines;
Active 5 5 Active 4 11 Active 3 5 Active 2 1 Active 1 5
Placebo 5 2 Placebo 4 4 Placebo 3 7 Placebo 2 7 Placebo 1 12
;

proc npar1way data=Arthritis KS;
class Treatment;
var Response;
freq Freq;
output out=ks;
run;
proc print;

run;

## Re: Calculating p-value for two-sample Kolmogorov-Smirnov test

I looked in the output data, all the p-values are 0.  The problem is my weights are huge, as high as 200,000.  But the true p-value calculated properly should not be 0.

Might EXACT help?  I fiddled with EXACT, MC, MAXTIME, etc. but it just sat there for long periods of time and I had to kill it.  Have to try N= as well.

Anyway, if I could just access the function NPAR1WAY uses I'd be fine...

## Re: Calculating p-value for two-sample Kolmogorov-Smirnov test

The help function in SAS can take you to the details of the KS test. Or just check out the web page

SAS/STAT(R) 9.2 User's Guide, Second Edition

New versions have the same information.

With the samples sizes you have, even the smallest difference will be significant.

## Re: Calculating p-value for two-sample Kolmogorov-Smirnov test

Yes, I saw the equations, I was hoping a function exists to do the work.  I understand in practice no one sums from zero to infinity, only a few terms are kept and special corrections are applied, so I'd much rather find a working function written by experts than have to research this on my own.

## Re: Calculating p-value for two-sample Kolmogorov-Smirnov test

Part of your message looks like you are concerned about getting p<0.0001, instead of the actual (small value), say p = 1.67*10^(-8). You can get a more exact printout by storing the relevant statistics with an ODS output statement, and then printing the stored file. Here is a simple example (without frequencies) where the printout gives < .0001. The last column of the KS2 file has two rows. First row is D and second row is p (in scientific notation). Variable is called nValue2 (for some strange reason).

data a;

do group = 1 to 2;

do rep = 1 to 10000;

y = group*.1 + rannor(1);

output;end;

end;

run;

proc npar1way data=a edf ks;

class group;

var y;

ods output KolSmir2Stats=ks2  ;

run;

proc print data=ks2;run;

## Re: Calculating p-value for two-sample Kolmogorov-Smirnov test

I tried that previously, it just printed exactly 0 for all the p-values (see column of zeros below).  Perhaps my weights are so large that it's hopeless to get a non-zero p-value that I can correct for the weights.  Seems like I'll just have to calculate the p-value myself.

I sure would like to find a function in SAS that does the calculation...

1 2 3 4 5 6 7 8 9 10 11 12 Male (<20) ApoB 0.00223333 6.2103 0.22041 0 2.242e-06 17.335 0.22293 6.2813 0 Male (20-29) ApoB 0.00247362 7.3677 0.12183 0 3.057e-06 27.119 0.12511 7.566 0 Male (30-39) ApoB 0.00519398 15.9733 0.16763 0 1.1119e-05 105.162 0.16785 15.9943 0 Male (40-49) ApoB 0.00300711 9.7514 0.07357 0 2.399e-06 25.23 0.09987 13.238 0 Male (50-59) ApoB 0.00481144 14.8999 0.09631 0 6.126e-06 58.747 0.12704 19.6538 0 Male (60-69) ApoB 0.00729394 19.9366 0.13297 0 9.974e-06 74.515 0.19752 29.6162 0 Male (70-79) ApoB 0.00560221 11.4631 0.09621 0 4.495e-06 18.821 0.15906 18.9521 0 Male (>=80) ApoB 0.00572916 7.4794 0.09921 0 1.1073e-05 18.872 0.13626 10.2727 0 Female (<20) ApoB 0.0025121 6.9934 0.22836 0 2.821e-06 21.866 0.22836 6.9934 0 Female (20-29) ApoB 0.00239719 7.4033 0.09843 0 2.358e-06 22.488 0.09896 7.4432 0 Female (30-39) ApoB 0.0057473 17.9301 0.16782 0 1.4166e-05 137.876 0.18721 20.0008 0 Female (40-49) ApoB 0.00400914 12.3046 0.08878 0 7.887e-06 74.293 0.10704 14.8363 0 Female (50-59) ApoB 0.00525079 17.1014 0.10402 0 6.173e-06 65.481 0.15556 25.5739 0 Female (60-69) ApoB 0.00587056 16.3198 0.10203 0 1.1481e-05 88.728 0.1589 25.4182 0 Female (70-79) ApoB 0.00418343 9.3058 0.07472 0 3.132e-06 15.496 0.14135 17.6047 0 Female (>=80) ApoB 0.00870609 14.2755 0.1629 0 1.5396e-05 41.394 0.19818 17.368 0

## Re: Calculating p-value for two-sample Kolmogorov-Smirnov test

Not my area. But any formula other than the exact one will be an approximation.I doubt if the p value will scale linearly with n, so there would be no simple upscaling.

