Is there a function I can call directly that will calculate the p-value given the K-S d-statistic and the number of entries in the two distributions? I need the function that NPAR1WAY uses behind the scenes. The problem is that I can't give NPAR1WAY what it needs (unweighted data, mine is weighted) so I use NPAR1WAY to get the d-statistic and I need to calculate the p-value myself. I could scale up the reported p-value except for my weighted data NPAR1WAY always says p<0.0001, i.e. it doesn't return a value I can scale!
I need something like: myPvalue = KS_pValue(dStatistic,n1,n2);
Ranges for n1,n2 in my case are: 1000 < n1 < 100,000, 50 < n2 < 2000.
You can get the actual p-value in the output data set. The value you are seeing is the formatted value using PVALUE format.
I looked in the output data, all the p-values are 0. The problem is my weights are huge, as high as 200,000. But the true p-value calculated properly should not be 0.
Might EXACT help? I fiddled with EXACT, MC, MAXTIME, etc. but it just sat there for long periods of time and I had to kill it. Have to try N= as well.
Anyway, if I could just access the function NPAR1WAY uses I'd be fine...
The help function in SAS can take you to the details of the KS test. Or just check out the web page
SAS/STAT(R) 9.2 User's Guide, Second Edition
New versions have the same information.
With the samples sizes you have, even the smallest difference will be significant.
Yes, I saw the equations, I was hoping a function exists to do the work. I understand in practice no one sums from zero to infinity, only a few terms are kept and special corrections are applied, so I'd much rather find a working function written by experts than have to research this on my own.
Part of your message looks like you are concerned about getting p<0.0001, instead of the actual (small value), say p = 1.67*10^(-8). You can get a more exact printout by storing the relevant statistics with an ODS output statement, and then printing the stored file. Here is a simple example (without frequencies) where the printout gives < .0001. The last column of the KS2 file has two rows. First row is D and second row is p (in scientific notation). Variable is called nValue2 (for some strange reason).
data a;
do group = 1 to 2;
do rep = 1 to 10000;
y = group*.1 + rannor(1);
output;end;
end;
run;
proc npar1way data=a edf ks;
class group;
var y;
ods output KolSmir2Stats=ks2 ;
run;
proc print data=ks2;run;
I tried that previously, it just printed exactly 0 for all the p-values (see column of zeros below). Perhaps my weights are so large that it's hopeless to get a non-zero p-value that I can correct for the weights. Seems like I'll just have to calculate the p-value myself.
I sure would like to find a function in SAS that does the calculation...
1 | Male | (<20) | ApoB | .002233328 | 6.2103 | 0.22041 | 0 | .000002242 | 17.335 | 0.22293 | 6.2813 | 0 |
---|---|---|---|---|---|---|---|---|---|---|---|---|
2 | Male | (20-29) | ApoB | .002473620 | 7.3677 | 0.12183 | 0 | .000003057 | 27.119 | 0.12511 | 7.5660 | 0 |
3 | Male | (30-39) | ApoB | .005193976 | 15.9733 | 0.16763 | 0 | .000011119 | 105.162 | 0.16785 | 15.9943 | 0 |
4 | Male | (40-49) | ApoB | .003007115 | 9.7514 | 0.07357 | 0 | .000002399 | 25.230 | 0.09987 | 13.2380 | 0 |
5 | Male | (50-59) | ApoB | .004811441 | 14.8999 | 0.09631 | 0 | .000006126 | 58.747 | 0.12704 | 19.6538 | 0 |
6 | Male | (60-69) | ApoB | .007293936 | 19.9366 | 0.13297 | 0 | .000009974 | 74.515 | 0.19752 | 29.6162 | 0 |
7 | Male | (70-79) | ApoB | .005602209 | 11.4631 | 0.09621 | 0 | .000004495 | 18.821 | 0.15906 | 18.9521 | 0 |
8 | Male | (>=80) | ApoB | .005729162 | 7.4794 | 0.09921 | 0 | .000011073 | 18.872 | 0.13626 | 10.2727 | 0 |
9 | Female | (<20) | ApoB | .002512096 | 6.9934 | 0.22836 | 0 | .000002821 | 21.866 | 0.22836 | 6.9934 | 0 |
10 | Female | (20-29) | ApoB | .002397185 | 7.4033 | 0.09843 | 0 | .000002358 | 22.488 | 0.09896 | 7.4432 | 0 |
11 | Female | (30-39) | ApoB | .005747304 | 17.9301 | 0.16782 | 0 | .000014166 | 137.876 | 0.18721 | 20.0008 | 0 |
12 | Female | (40-49) | ApoB | .004009143 | 12.3046 | 0.08878 | 0 | .000007887 | 74.293 | 0.10704 | 14.8363 | 0 |
13 | Female | (50-59) | ApoB | .005250789 | 17.1014 | 0.10402 | 0 | .000006173 | 65.481 | 0.15556 | 25.5739 | 0 |
14 | Female | (60-69) | ApoB | .005870556 | 16.3198 | 0.10203 | 0 | .000011481 | 88.728 | 0.15890 | 25.4182 | 0 |
15 | Female | (70-79) | ApoB | .004183428 | 9.3058 | 0.07472 | 0 | .000003132 | 15.496 | 0.14135 | 17.6047 | 0 |
16 | Female | (>=80) | ApoB | .008706090 | 14.2755 | 0.16290 | 0 | .000015396 | 41.394 | 0.19818 | 17.3680 | 0 |
Not my area. But any formula other than the exact one will be an approximation.I doubt if the p value will scale linearly with n, so there would be no simple upscaling.
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.
Find more tutorials on the SAS Users YouTube channel.