BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
Recep
Quartz | Level 8

Hello,

 

I'm trying to figure out what test would be appropriate to compare two independent samples for count data:

 

data test;
input year city1 city2;
datalines;
2016 220 130
2017 140 180
2018 120 202
2019 140 134
2020 135 166
;

 

I don't have any denominators for the cities by year (for instance the frequencies in the city1 and city2 represent number of traffic accidents for each year). If I want to test if the number of traffic accidents differ by the cities what test I can use? 

 

Thanks a lot in advance!

 

 

1 ACCEPTED SOLUTION

Accepted Solutions
StatDave
SAS Super FREQ

Count data are typically modeled using the Poisson or negative binomial distribution. Such models are easily fit in procedures like GENMOD, GLIMMIX, and HPGENSELECT. For example, the following fits a model using the negative binomial distribution which accommodates overdispersion in the data. 

data test;
input year city1 city2;
y=city1; city=1; output;
y=city2; city=2; output;
datalines;
2016 220 130
2017 140 180
2018 120 202
2019 140 134
2020 135 166
;
proc genmod;
class city;
model y=city / dist=negbin;
run;

View solution in original post

11 REPLIES 11
Reeza
Super User

1. First standardize for the population or # of drivers or # of cars in each city

2. Then look at PROC FREQ with either a chi-square test or a cochran-armitage test. 

 

If you don't want to account for year, sum them up and use ChiSquare. 

 

If you want to account for year, use Cochran-Armitage

https://documentation.sas.com/doc/en/statug/15.2/statug_freq_details76.htm

 

Standardization is important here. If I compare a city of 1 million to a city of 5 million the accident counts should not be expected to be the same. 

 


@Recep wrote:

Hello,

 

I'm trying to figure out what test would be appropriate to compare two independent samples for count data:

 

data test;
input year city1 city2;
datalines;
2016 220 130
2017 140 180
2018 120 202
2019 140 134
2020 135 166
;

 

I don't have any denominators for the cities by year (for instance the frequencies in the city1 and city2 represent number of traffic accidents for each year). If I want to test if the number of traffic accidents differ by the cities what test I can use? 

 

Thanks a lot in advance!

 

 


 

Recep
Quartz | Level 8
Hi Reeza,
Thanks a lot for your response but as I mentioned in my question I do not have any sort of denominator information. The example I provided was fictitious. You can assume instead of number of accidents those are the number of meteorites that fell into each city from the sky and I want to know if one city has more meteorites fallen than the other one.
Cheers....
Reeza
Super User

Then go find the spatial area of your city. That’s likely constant over time at least so just two values to look up. Otherwise, you’re comparing apples and oranges.

 


@Recep wrote:
Hi Reeza,
Thanks a lot for your response but as I mentioned in my question I do not have any sort of denominator information. The example I provided was fictitious. You can assume instead of number of accidents those are the number of meteorites that fell into each city from the sky and I want to know if one city has more meteorites fallen than the other one.
Cheers....


 

Ksharp
Super User

You could try K-S test.

data test;
input year city1 city2;
datalines;
2016 220 130
2017 140 180
2018 120 202
2019 140 134
2020 135 166
;
data have;
 set test;
 city='city1';count=city1;output;
 city='city2';count=city2;output;
 keep city count;
run;

proc npar1way data=have plots=edfplot edf ;
class city;
var count;
run;

But your case is special due to have YEAR variable.

Maybe @Rick_SAS  @StatDave  have some good idea .

Rick_SAS
SAS Super FREQ

You could try a paired t test. The procedure includes graphical output to help you assess whether the data might satisfy the assumptions of the test:

ods graphics on;
proc ttest data=test;
   paired city1*city2;
run;
Ksharp
Super User
Rick,
I like your idea. But ttest is parameter method ,NOT non-parameter method like K-S test.
proc ttest is usually suited for NORMAL data ,not count data I think !
Rick_SAS
SAS Super FREQ

@Ksharp : Thanks for your criticism. I am aware of the assumptions of the three procedures that were suggested. In many cases, count data are well-approximated by a normal distribution, but you are certainly entitled to your opinion. If there were more data, we could debate the issue, but a debate seems pointless when the OP's data contains 5 observations. For the posted data, I doubt it matters which method is used.

StatDave
SAS Super FREQ

Count data are typically modeled using the Poisson or negative binomial distribution. Such models are easily fit in procedures like GENMOD, GLIMMIX, and HPGENSELECT. For example, the following fits a model using the negative binomial distribution which accommodates overdispersion in the data. 

data test;
input year city1 city2;
y=city1; city=1; output;
y=city2; city=2; output;
datalines;
2016 220 130
2017 140 180
2018 120 202
2019 140 134
2020 135 166
;
proc genmod;
class city;
model y=city / dist=negbin;
run;
Recep
Quartz | Level 8

Thanks a lot Dave! Then I'm assuming that the p-value (0.5548 in this example) will tell if the two cities are statistically significantly different from each other (or more technically, in this example, we have no reason to reject the null hypothesis which assumes there is no difference between two cities). 

Reeza
Super User
If this is for homework go with that. If this is for decision making, then what I said earlier still applies and you cannot compare the raw numbers.
StatDave
SAS Super FREQ

Yes, that's correct.

sas-innovate-2024.png

Available on demand!

Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.

 

Register now!

What is ANOVA?

ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 11 replies
  • 1032 views
  • 7 likes
  • 5 in conversation