BookmarkSubscribeRSS Feed
ewolin
Calcite | Level 5

Is there a function I can call directly that will calculate the p-value given the K-S d-statistic and the number of entries in the two distributions?  I need the function that NPAR1WAY uses behind the scenes.  The problem is that I can't give NPAR1WAY what it needs (unweighted data, mine is weighted) so I use NPAR1WAY to get the d-statistic and I need to calculate the p-value myself.  I could scale up the reported p-value except for my weighted data NPAR1WAY always says p<0.0001, i.e. it doesn't return a value I can scale!

I need something like:   myPvalue = KS_pValue(dStatistic,n1,n2);

Ranges for n1,n2 in my case are:  1000 < n1 < 100,000, 50 < n2 < 2000.

7 REPLIES 7
data_null__
Jade | Level 19

You can get the actual p-value in the output data set.  The value you are seeing is the formatted value using PVALUE format.

data Arthritis;
   input Treatment $ Response Freq @@;
   datalines;
Active 5 5 Active 4 11 Active 3 5 Active 2 1 Active 1 5
Placebo 5 2 Placebo 4 4 Placebo 3 7 Placebo 2 7 Placebo 1 12
;

proc npar1way data=Arthritis KS;
   class Treatment;
   var Response;
   freq Freq;
   output out=ks;
   run;
proc print;
  
run;

8-25-2014 12-44-32 PM.png
ewolin
Calcite | Level 5

I looked in the output data, all the p-values are 0.  The problem is my weights are huge, as high as 200,000.  But the true p-value calculated properly should not be 0.

Might EXACT help?  I fiddled with EXACT, MC, MAXTIME, etc. but it just sat there for long periods of time and I had to kill it.  Have to try N= as well.

Anyway, if I could just access the function NPAR1WAY uses I'd be fine...

lvm
Rhodochrosite | Level 12 lvm
Rhodochrosite | Level 12

The help function in SAS can take you to the details of the KS test. Or just check out the web page

SAS/STAT(R) 9.2 User's Guide, Second Edition

New versions have the same information.

With the samples sizes you have, even the smallest difference will be significant.

ewolin
Calcite | Level 5

Yes, I saw the equations, I was hoping a function exists to do the work.  I understand in practice no one sums from zero to infinity, only a few terms are kept and special corrections are applied, so I'd much rather find a working function written by experts than have to research this on my own.

lvm
Rhodochrosite | Level 12 lvm
Rhodochrosite | Level 12

Part of your message looks like you are concerned about getting p<0.0001, instead of the actual (small value), say p = 1.67*10^(-8). You can get a more exact printout by storing the relevant statistics with an ODS output statement, and then printing the stored file. Here is a simple example (without frequencies) where the printout gives < .0001. The last column of the KS2 file has two rows. First row is D and second row is p (in scientific notation). Variable is called nValue2 (for some strange reason).

data a;

do group = 1 to 2;

do rep = 1 to 10000;

y = group*.1 + rannor(1);

output;end;

end;

run;

proc npar1way data=a edf ks;

class group;

var y;

ods output KolSmir2Stats=ks2  ;

run;

proc print data=ks2;run;

ewolin
Calcite | Level 5

I tried that previously, it just printed exactly 0 for all the p-values (see column of zeros below).  Perhaps my weights are so large that it's hopeless to get a non-zero p-value that I can correct for the weights.  Seems like I'll just have to calculate the p-value myself.

I sure would like to find a function in SAS that does the calculation...


1Male(<20)ApoB.0022333286.21030.220410.00000224217.3350.222936.28130
2Male(20-29)ApoB.0024736207.36770.121830.00000305727.1190.125117.56600
3Male(30-39)ApoB.00519397615.97330.167630.000011119105.1620.1678515.99430
4Male(40-49)ApoB.0030071159.75140.073570.00000239925.2300.0998713.23800
5Male(50-59)ApoB.00481144114.89990.096310.00000612658.7470.1270419.65380
6Male(60-69)ApoB.00729393619.93660.132970.00000997474.5150.1975229.61620
7Male(70-79)ApoB.00560220911.46310.096210.00000449518.8210.1590618.95210
8Male(>=80)ApoB.0057291627.47940.099210.00001107318.8720.1362610.27270
9Female(<20)ApoB.0025120966.99340.228360.00000282121.8660.228366.99340
10Female(20-29)ApoB.0023971857.40330.098430.00000235822.4880.098967.44320
11Female(30-39)ApoB.00574730417.93010.167820.000014166137.8760.1872120.00080
12Female(40-49)ApoB.00400914312.30460.088780.00000788774.2930.1070414.83630
13Female(50-59)ApoB.00525078917.10140.104020.00000617365.4810.1555625.57390
14Female(60-69)ApoB.00587055616.31980.102030.00001148188.7280.1589025.41820
15Female(70-79)ApoB.0041834289.30580.074720.00000313215.4960.1413517.60470
16Female(>=80)ApoB.00870609014.27550.162900.00001539641.3940.1981817.36800

lvm
Rhodochrosite | Level 12 lvm
Rhodochrosite | Level 12

Not my area. But any formula other than the exact one will be an approximation.I doubt if the p value will scale linearly with n, so there would be no simple upscaling.

sas-innovate-2024.png

Don't miss out on SAS Innovate - Register now for the FREE Livestream!

Can't make it to Vegas? No problem! Watch our general sessions LIVE or on-demand starting April 17th. Hear from SAS execs, best-selling author Adam Grant, Hot Ones host Sean Evans, top tech journalist Kara Swisher, AI expert Cassie Kozyrkov, and the mind-blowing dance crew iLuminate! Plus, get access to over 20 breakout sessions.

 

Register now!

What is ANOVA?

ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 7 replies
  • 4416 views
  • 0 likes
  • 3 in conversation