BookmarkSubscribeRSS Feed
Quantopic
Obsidian | Level 7

Hello SAS user,

 

I have to run the Kolmogorov-Smirnov test on a Poisson distributed data, by quantifying the distance between the empirical distribution function of the loss data set and the cumulative Poisson distribution function; in my case, the Poisson distribution function is the reference parametric distribution.

 

Browsing on the internet, till now I found only the KS test special case in which one compares the Normal distribution vs the empirical one.

 

What about the case I need to compare the empirical distribution function against the Poisson distribution?

 

Thanks all in advance for your help!

 

 

6 REPLIES 6
Ksharp
Super User

KS Test is nonparameter test, which means it does not matter what kind of distribution your variable conform to , you always could use KS Test.

Quantopic
Obsidian | Level 7

Thanks for your answer @Ksharp!

 

Do you mean I could simply use the UNIVARIATE procedure to implement the KS test?

 

Particularly, I may use:

 

proc npar1way
        edf
        data = dataset;
                    class x
                    var y;
        exact ks;
run;

 

where  y is the observed data and x is a vector of simulated data  coming from a Poisson distribution?

 

Thanks!

Ksharp
Super User

I totally agree with @Rick_SAS . and I do remember Rick has written a blog about this question. 

Search Poisson at Rick's blog, you will find . or @Rick_SAS could point you the URL .

 

Back to your question. Yes. You can do this but you need change data structure.

 

x  y

7 2

5 6

...

 

-->

 

name value

x 7

x 5

....

y 2

y 6

.....

 

after that ,run KS test.

 

proc npar1way
        edf
        data = dataset;
                    class name
                    var value ;
        exact ks;
run;
Rick_SAS
SAS Super FREQ

Can you explain WHY you have to run a KS test for Poisson data?

PROC UNIVARIATE is not appropriate for discrete (count) data.

If you are trying to fit Poisson data, you can use PROC GENMOD, which provides goodness-of-fit statistics.

If you want a graphical representation of the fit (similar to a quantile-quantile plot) you can create a "Poissonness plot", although for small data.it might not be very enlightening.

Quantopic
Obsidian | Level 7

HI @Rick_SAS and thanks for your answer.

 

I have to compute the KS stat and the relative p-value comparing the the theoretical Poisson distribution with the observed data.

 

I agree with you about the fact it does not make sense, but it is a request for reporting the validation results of internal models; the aim is to quantify a distance between the empirical distribution function and the cdf of the Poisson distribution.

 

By using the PROC GENMOD, as you suggested above, I did not get the KS statistic and p-value.

 

Could you suggest some way to run the KS test?

 

Thanks!

 

 

Rick_SAS
SAS Super FREQ

Quantopic wrote:

 

I agree with you about the fact it does not make sense, but it is a request for reporting the validation results of internal models; the aim is to quantify a distance between the empirical distribution function and the cdf of the Poisson distribution.

 

 


When something does not make sense, you should point that out to your supervisor.

 

I am familiar with a recent paper that shows how to compute KS statistics for discrete distributions, but the computation is much more difficult than for continuous data. If your company licenses SA/IML and you are an experienced SAS/IML programmer with knowledge of numerical analysis, you might be able to implement the procedure in a few days or weeks. If you are less experienced or don't have SAS/IML, it will take longer. And to what purpose? The KS test is not more powerful than other GOF tests that are already provided.

 

Talk to your supervisor and explain that KS tests for discrete distributions are still at the research stage and have not made their way into SAS procedures.  To implement it yourself would require advanced IML programming.

SAS Innovate 2025: Save the Date

 SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!

Save the date!

What is ANOVA?

ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 6 replies
  • 3758 views
  • 0 likes
  • 3 in conversation