Hello @FreelanceReinh ,
Thank you so much for your detailed response.
To your first point: I looked at several papers from the 1960's for p-value tables, but I wasn't quite comfortable using any of these tables as I am not very familiar with nonparametric statistics. I did not want to invest much time into carefully reading these papers to understand exactly what they were doing.
To your second point about ties: You mention that there are many ties in my data set because both samples have the same NAICS codes. If my understanding is correct, a tie refers to the y-values. In my case, NAICS codes are my x-values (categories) and the frequencies are my y-values. I used Example 8.2 in SAS documentation on PROC NPAR1WAY, where the responses (1 through 5, categorical) are x-values, and their frequencies are y-values. PROC NPAR1WAY compares two samples (active vs placebo).
SAS Help Center: Example 85.1 Two-Sample Location Tests and Plots
SAS Help Center: Example 85.2 EDF Statistics and EDF Plot
From this point of view, I should not have any ties in my data set. Am I missing something?
To your last point about the computational difficulties: I ended up using an R function that implements K-S, C-vM, and Anderson-Darling tests and produces (if I understand correctly) exact p-values.
6.2 Comparison of two distributions | Notes for Nonparametric Statistics (bookdown.org)
The numerical values from SAS and R do not agree. Could I be using the wrong R function?
Any insights would be greatly appreciated.
Sincerely,
Cuneyt
... View more