Statistical Procedures

ewolin · Posted 08-25-2014 01:27 PM

Is there a function I can call directly that will calculate the p-value given the K-S d-statistic and the number of entries in the two distributions? I need the function that NPAR1WAY uses behind the scenes. The problem is that I can't give NPAR1WAY what it needs (unweighted data, mine is weighted) so I use NPAR1WAY to get the d-statistic and I need to calculate the p-value myself. I could scale up the reported p-value except for my weighted data NPAR1WAY always says p<0.0001, i.e. it doesn't return a value I can scale!

I need something like: myPvalue = KS_pValue(dStatistic,n1,n2);

Ranges for n1,n2 in my case are: 1000 < n1 < 100,000, 50 < n2 < 2000.

data_null__ · Posted 08-25-2014 01:42 PM

You can get the actual p-value in the output data set. The value you are seeing is the formatted value using PVALUE format.

data Arthritis;
   input Treatment $ Response Freq @@;
   datalines; 
Active 5 5 Active 4 11 Active 3 5 Active 2 1 Active 1 5 
Placebo 5 2 Placebo 4 4 Placebo 3 7 Placebo 2 7 Placebo 1 12 
; 

proc npar1way data=Arthritis KS;
   class Treatment;
   var Response;
   freq Freq;
   output out=ks;
   run; 
proc print; 
   run;

ewolin · Posted 08-25-2014 02:44 PM

I looked in the output data, all the p-values are 0. The problem is my weights are huge, as high as 200,000. But the true p-value calculated properly should not be 0.

Might EXACT help? I fiddled with EXACT, MC, MAXTIME, etc. but it just sat there for long periods of time and I had to kill it. Have to try N= as well.

Anyway, if I could just access the function NPAR1WAY uses I'd be fine...

lvm · Posted 08-25-2014 03:26 PM

The help function in SAS can take you to the details of the KS test. Or just check out the web page

SAS/STAT(R) 9.2 User's Guide, Second Edition

New versions have the same information.

With the samples sizes you have, even the smallest difference will be significant.

ewolin · Posted 08-25-2014 03:32 PM

Yes, I saw the equations, I was hoping a function exists to do the work. I understand in practice no one sums from zero to infinity, only a few terms are kept and special corrections are applied, so I'd much rather find a working function written by experts than have to research this on my own.

lvm · Posted 08-25-2014 03:50 PM

Part of your message looks like you are concerned about getting p<0.0001, instead of the actual (small value), say p = 1.67*10^(-8). You can get a more exact printout by storing the relevant statistics with an ODS output statement, and then printing the stored file. Here is a simple example (without frequencies) where the printout gives < .0001. The last column of the KS2 file has two rows. First row is D and second row is p (in scientific notation). Variable is called nValue2 (for some strange reason).

data a;

do group = 1 to 2;

do rep = 1 to 10000;

y = group*.1 + rannor(1);

output;end;

end;

run;

proc npar1way data=a edf ks;

class group;

var y;

ods output KolSmir2Stats=ks2 ;

run;

proc print data=ks2;run;

ewolin · Posted 08-25-2014 04:37 PM

I tried that previously, it just printed exactly 0 for all the p-values (see column of zeros below). Perhaps my weights are so large that it's hopeless to get a non-zero p-value that I can correct for the weights. Seems like I'll just have to calculate the p-value myself.

I sure would like to find a function in SAS that does the calculation...

1	Male	(<20)	ApoB	.002233328	6.2103	0.22041	.000002242	17.335	0.22293	6.2813
2	Male	(20-29)	ApoB	.002473620	7.3677	0.12183	.000003057	27.119	0.12511	7.5660
3	Male	(30-39)	ApoB	.005193976	15.9733	0.16763	.000011119	105.162	0.16785	15.9943
4	Male	(40-49)	ApoB	.003007115	9.7514	0.07357	.000002399	25.230	0.09987	13.2380
5	Male	(50-59)	ApoB	.004811441	14.8999	0.09631	.000006126	58.747	0.12704	19.6538
6	Male	(60-69)	ApoB	.007293936	19.9366	0.13297	.000009974	74.515	0.19752	29.6162
7	Male	(70-79)	ApoB	.005602209	11.4631	0.09621	.000004495	18.821	0.15906	18.9521
8	Male	(>=80)	ApoB	.005729162	7.4794	0.09921	.000011073	18.872	0.13626	10.2727
9	Female	(<20)	ApoB	.002512096	6.9934	0.22836	.000002821	21.866	0.22836	6.9934
10	Female	(20-29)	ApoB	.002397185	7.4033	0.09843	.000002358	22.488	0.09896	7.4432
11	Female	(30-39)	ApoB	.005747304	17.9301	0.16782	.000014166	137.876	0.18721	20.0008
12	Female	(40-49)	ApoB	.004009143	12.3046	0.08878	.000007887	74.293	0.10704	14.8363
13	Female	(50-59)	ApoB	.005250789	17.1014	0.10402	.000006173	65.481	0.15556	25.5739
14	Female	(60-69)	ApoB	.005870556	16.3198	0.10203	.000011481	88.728	0.15890	25.4182
15	Female	(70-79)	ApoB	.004183428	9.3058	0.07472	.000003132	15.496	0.14135	17.6047
16	Female	(>=80)	ApoB	.008706090	14.2755	0.16290	.000015396	41.394	0.19818	17.3680

lvm · Posted 08-25-2014 04:55 PM

Not my area. But any formula other than the exact one will be an approximation.I doubt if the p value will scale linearly with n, so there would be no simple upscaling.

Statistical Procedures

Calculating p-value for two-sample Kolmogorov-Smirnov test

Re: Calculating p-value for two-sample Kolmogorov-Smirnov test

Re: Calculating p-value for two-sample Kolmogorov-Smirnov test

Re: Calculating p-value for two-sample Kolmogorov-Smirnov test

Re: Calculating p-value for two-sample Kolmogorov-Smirnov test

Re: Calculating p-value for two-sample Kolmogorov-Smirnov test

Re: Calculating p-value for two-sample Kolmogorov-Smirnov test

Re: Calculating p-value for two-sample Kolmogorov-Smirnov test

P-value from Proc life test

trend p-value using the Cochran-Armitage test

Kolmogorov–Smirnov Tests for Various Distributions?

Can i use Kolmogorov-Smirnov on categorical variables

Resilient Finance: SAS Stress Testing Calculation Process

Follow Us

What is...

Statistical Procedures

Our biggest data and AI event of the year.

Follow Us

What is...