Programming the statistical procedures from SAS

Calculating p-value for two-sample Kolmogorov-Smirnov test

Reply
Occasional Contributor
Posts: 14

Calculating p-value for two-sample Kolmogorov-Smirnov test

Is there a function I can call directly that will calculate the p-value given the K-S d-statistic and the number of entries in the two distributions?  I need the function that NPAR1WAY uses behind the scenes.  The problem is that I can't give NPAR1WAY what it needs (unweighted data, mine is weighted) so I use NPAR1WAY to get the d-statistic and I need to calculate the p-value myself.  I could scale up the reported p-value except for my weighted data NPAR1WAY always says p<0.0001, i.e. it doesn't return a value I can scale!

I need something like:   myPvalue = KS_pValue(dStatistic,n1,n2);

Ranges for n1,n2 in my case are:  1000 < n1 < 100,000, 50 < n2 < 2000.

Respected Advisor
Posts: 3,780

Re: Calculating p-value for two-sample Kolmogorov-Smirnov test

You can get the actual p-value in the output data set.  The value you are seeing is the formatted value using PVALUE format.

data Arthritis;
   input Treatment $ Response Freq @@;
   datalines;
Active 5 5 Active 4 11 Active 3 5 Active 2 1 Active 1 5
Placebo 5 2 Placebo 4 4 Placebo 3 7 Placebo 2 7 Placebo 1 12
;

proc npar1way data=Arthritis KS;
   class Treatment;
   var Response;
   freq Freq;
   output out=ks;
   run;
proc print;
  
run;

8-25-2014 12-44-32 PM.png
Occasional Contributor
Posts: 14

Re: Calculating p-value for two-sample Kolmogorov-Smirnov test

I looked in the output data, all the p-values are 0.  The problem is my weights are huge, as high as 200,000.  But the true p-value calculated properly should not be 0.

Might EXACT help?  I fiddled with EXACT, MC, MAXTIME, etc. but it just sat there for long periods of time and I had to kill it.  Have to try N= as well.

Anyway, if I could just access the function NPAR1WAY uses I'd be fine...

Valued Guide
Valued Guide
Posts: 684

Re: Calculating p-value for two-sample Kolmogorov-Smirnov test

The help function in SAS can take you to the details of the KS test. Or just check out the web page

SAS/STAT(R) 9.2 User's Guide, Second Edition

New versions have the same information.

With the samples sizes you have, even the smallest difference will be significant.

Occasional Contributor
Posts: 14

Re: Calculating p-value for two-sample Kolmogorov-Smirnov test

Yes, I saw the equations, I was hoping a function exists to do the work.  I understand in practice no one sums from zero to infinity, only a few terms are kept and special corrections are applied, so I'd much rather find a working function written by experts than have to research this on my own.

Valued Guide
Valued Guide
Posts: 684

Re: Calculating p-value for two-sample Kolmogorov-Smirnov test

Part of your message looks like you are concerned about getting p<0.0001, instead of the actual (small value), say p = 1.67*10^(-8). You can get a more exact printout by storing the relevant statistics with an ODS output statement, and then printing the stored file. Here is a simple example (without frequencies) where the printout gives < .0001. The last column of the KS2 file has two rows. First row is D and second row is p (in scientific notation). Variable is called nValue2 (for some strange reason).

data a;

do group = 1 to 2;

do rep = 1 to 10000;

y = group*.1 + rannor(1);

output;end;

end;

run;

proc npar1way data=a edf ks;

class group;

var y;

ods output KolSmir2Stats=ks2  ;

run;

proc print data=ks2;run;

Occasional Contributor
Posts: 14

Re: Calculating p-value for two-sample Kolmogorov-Smirnov test

I tried that previously, it just printed exactly 0 for all the p-values (see column of zeros below).  Perhaps my weights are so large that it's hopeless to get a non-zero p-value that I can correct for the weights.  Seems like I'll just have to calculate the p-value myself.

I sure would like to find a function in SAS that does the calculation...


1Male(<20)ApoB.0022333286.21030.220410.00000224217.3350.222936.28130
2Male(20-29)ApoB.0024736207.36770.121830.00000305727.1190.125117.56600
3Male(30-39)ApoB.00519397615.97330.167630.000011119105.1620.1678515.99430
4Male(40-49)ApoB.0030071159.75140.073570.00000239925.2300.0998713.23800
5Male(50-59)ApoB.00481144114.89990.096310.00000612658.7470.1270419.65380
6Male(60-69)ApoB.00729393619.93660.132970.00000997474.5150.1975229.61620
7Male(70-79)ApoB.00560220911.46310.096210.00000449518.8210.1590618.95210
8Male(>=80)ApoB.0057291627.47940.099210.00001107318.8720.1362610.27270
9Female(<20)ApoB.0025120966.99340.228360.00000282121.8660.228366.99340
10Female(20-29)ApoB.0023971857.40330.098430.00000235822.4880.098967.44320
11Female(30-39)ApoB.00574730417.93010.167820.000014166137.8760.1872120.00080
12Female(40-49)ApoB.00400914312.30460.088780.00000788774.2930.1070414.83630
13Female(50-59)ApoB.00525078917.10140.104020.00000617365.4810.1555625.57390
14Female(60-69)ApoB.00587055616.31980.102030.00001148188.7280.1589025.41820
15Female(70-79)ApoB.0041834289.30580.074720.00000313215.4960.1413517.60470
16Female(>=80)ApoB.00870609014.27550.162900.00001539641.3940.1981817.36800

Valued Guide
Valued Guide
Posts: 684

Re: Calculating p-value for two-sample Kolmogorov-Smirnov test

Not my area. But any formula other than the exact one will be an approximation.I doubt if the p value will scale linearly with n, so there would be no simple upscaling.

Ask a Question
Discussion stats
  • 7 replies
  • 1123 views
  • 0 likes
  • 3 in conversation