topic Re: Comparing two independent samples for count data in Statistical Procedures

Comparing two independent samples for count data

Recep — Tue, 14 Sep 2021 23:57:49 GMT

Hello,

I'm trying to figure out what test would be appropriate to compare two independent samples for count data:

data test;
input year city1 city2;
datalines;
2016 220 130
2017 140 180
2018 120 202
2019 140 134
2020 135 166
;

I don't have any denominators for the cities by year (for instance the frequencies in the city1 and city2 represent number of traffic accidents for each year). If I want to test if the number of traffic accidents differ by the cities what test I can use?

Thanks a lot in advance!

Re: Comparing two independent samples for count data

Reeza — Wed, 15 Sep 2021 00:12:16 GMT

1. First standardize for the population or # of drivers or # of cars in each city

2. Then look at PROC FREQ with either a chi-square test or a cochran-armitage test.

If you don't want to account for year, sum them up and use ChiSquare.

If you want to account for year, use Cochran-Armitage

https://documentation.sas.com/doc/en/statug/15.2/statug_freq_details76.htm

Standardization is important here. If I compare a city of 1 million to a city of 5 million the accident counts should not be expected to be the same.

@Recep wrote:

Hello,

I'm trying to figure out what test would be appropriate to compare two independent samples for count data:

data test;
input year city1 city2;
datalines;
2016 220 130
2017 140 180
2018 120 202
2019 140 134
2020 135 166
;

I don't have any denominators for the cities by year (for instance the frequencies in the city1 and city2 represent number of traffic accidents for each year). If I want to test if the number of traffic accidents differ by the cities what test I can use?

Thanks a lot in advance!

Re: Comparing two independent samples for count data

Recep — Wed, 15 Sep 2021 01:35:11 GMT

Hi Reeza,
Thanks a lot for your response but as I mentioned in my question I do not have any sort of denominator information. The example I provided was fictitious. You can assume instead of number of accidents those are the number of meteorites that fell into each city from the sky and I want to know if one city has more meteorites fallen than the other one.
Cheers....

Re: Comparing two independent samples for count data

Reeza — Wed, 15 Sep 2021 02:51:21 GMT

Then go find the spatial area of your city. That’s likely constant over time at least so just two values to look up. Otherwise, you’re comparing apples and oranges.

@Recep wrote:
Hi Reeza,
Thanks a lot for your response but as I mentioned in my question I do not have any sort of denominator information. The example I provided was fictitious. You can assume instead of number of accidents those are the number of meteorites that fell into each city from the sky and I want to know if one city has more meteorites fallen than the other one.
Cheers....

Re: Comparing two independent samples for count data

Ksharp — Wed, 15 Sep 2021 12:58:42 GMT

You could try K-S test.

data test;
input year city1 city2;
datalines;
2016 220 130
2017 140 180
2018 120 202
2019 140 134
2020 135 166
;
data have;
 set test;
 city='city1';count=city1;output;
 city='city2';count=city2;output;
 keep city count;
run;

proc npar1way data=have plots=edfplot edf ;
class city;
var count;
run;

But your case is special due to have YEAR variable.

Maybe @Rick_SAS @StatDave have some good idea .

Re: Comparing two independent samples for count data

Rick_SAS — Wed, 15 Sep 2021 13:17:57 GMT

You could try a paired t test. The procedure includes graphical output to help you assess whether the data might satisfy the assumptions of the test:

ods graphics on;
proc ttest data=test;
   paired city1*city2;
run;

Re: Comparing two independent samples for count data

StatDave — Wed, 15 Sep 2021 13:38:38 GMT

Count data are typically modeled using the Poisson or negative binomial distribution. Such models are easily fit in procedures like GENMOD, GLIMMIX, and HPGENSELECT. For example, the following fits a model using the negative binomial distribution which accommodates overdispersion in the data.

data test;
input year city1 city2;
y=city1; city=1; output;
y=city2; city=2; output;
datalines;
2016 220 130
2017 140 180
2018 120 202
2019 140 134
2020 135 166
;
proc genmod;
class city;
model y=city / dist=negbin;
run;

Re: Comparing two independent samples for count data

Ksharp — Wed, 15 Sep 2021 14:07:48 GMT

Rick,
I like your idea. But ttest is parameter method ,NOT non-parameter method like K-S test.
proc ttest is usually suited for NORMAL data ,not count data I think !

Re: Comparing two independent samples for count data

Rick_SAS — Wed, 15 Sep 2021 14:35:05 GMT

@Ksharp : Thanks for your criticism. I am aware of the assumptions of the three procedures that were suggested. In many cases, count data are well-approximated by a normal distribution, but you are certainly entitled to your opinion. If there were more data, we could debate the issue, but a debate seems pointless when the OP's data contains 5 observations. For the posted data, I doubt it matters which method is used.

Re: Comparing two independent samples for count data

Recep — Wed, 15 Sep 2021 17:10:18 GMT

Thanks a lot Dave! Then I'm assuming that the p-value (0.5548 in this example) will tell if the two cities are statistically significantly different from each other (or more technically, in this example, we have no reason to reject the null hypothesis which assumes there is no difference between two cities).

Re: Comparing two independent samples for count data

Reeza — Wed, 15 Sep 2021 17:43:53 GMT

If this is for homework go with that. If this is for decision making, then what I said earlier still applies and you cannot compare the raw numbers.

Re: Comparing two independent samples for count data

StatDave — Wed, 15 Sep 2021 18:04:29 GMT

Yes, that's correct.