04-13-2017 01:20 PM
I have a problem with the dbI'm working on. I want to compare the mean value of two variable. I can't use t-test since the variables are highly zero-inflated. The variables represent the number of security breaches over different time blocks that were recorded over a period of 4 years (Ex. Variable X1 and X2 represent the number of security breaches occurred in (7pm-9pm) & (10pm-12pm) blocks). I want to compare the mean value of these variables. I think I can't use Chi-square proportion test as well since most of the recorded data have the value of <5. In addition, these variables suffer from overdispersion problem, as well (Ex. Mean=8.36 & Variance=331.66). What should I do in this case? Thank you!
04-17-2017 10:32 AM
Thank you for your response. I also have another problem. Please consider my example: I have a table for the number of incidents occurred in two-hour blocks (Ex. 0-2, 2-4, etc). The data for each block recorded over 4 years and it is heavily inflated with zeros. So I have a table like:
Day [0-2] [2-4] [4-6] ....... [10-0]
Match 1 0 1 1 ....... 13
March 2 1 2 0 ........ 2
March 30 0 10 0 2
How can I compare the proportion of the number of attacks occurred during different time blocks in March? Please give me a hand in coding as well. Thank you!
04-19-2017 10:33 PM
Maybe you could use GEE model.
make your data like:
date count range
Mar1 0 2 <--[0-2]
Mar1 1 2 <---[2-4]
Mar1 13 10 <-- [10-0]
Mar2 1 2 <--[0-2]
Make RANGE as offset variable,and use PROC GENMOD or PROC GEE to
model a GEE. Check
Example 44.7: Log-Linear Model for Count Data
Example 43.2: Log-Linear Model for Count Data
and also consider to use LSMEAN and ZEROMODEL statement for zero .
Here is how to compare the differece between two proportion
(i.e. move the OFFSET variable into left side of model)
Need further help from the community? Please ask a new question.