BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
Yazdan
Fluorite | Level 6

Hi all,

 

I have a problem with the dbI'm working on. I want to compare the mean value of two variable. I can't use t-test since the variables are highly zero-inflated. The variables represent the number of security breaches over different time blocks that were recorded over a period of 4 years (Ex. Variable X1 and X2 represent the number of security breaches occurred in (7pm-9pm) & (10pm-12pm) blocks). I want to compare the mean value of these variables. I think I can't use Chi-square proportion test as well since most of the recorded data have the value of <5. In addition, these variables suffer from overdispersion problem, as well (Ex. Mean=8.36 & Variance=331.66). What should I do in this case? Thank you!

 

Regards,

Yazdan 

1 ACCEPTED SOLUTION
4 REPLIES 4
Yazdan
Fluorite | Level 6

Thank you for your response. I also have another problem. Please consider my example: I have a table for the number of incidents occurred in two-hour blocks (Ex. 0-2, 2-4, etc). The data for each block recorded over 4 years and it is heavily inflated with zeros. So I have a table like:

 

Day                     [0-2]         [2-4]        [4-6]    .......   [10-0]

Match 1                  0             1             1       .......      13         

March 2                  1             2             0       ........       2  

.

.

.

.

March 30              0            10             0                      2

 

 

How can I compare the proportion of the number of attacks occurred during different time blocks in March? Please give me a hand in coding as well. Thank you!

 

 

Yazdan

Ksharp
Super User
From my opinion, maybe you need GLMM model.
Check PROC GLIMMIX.

I am not expert about GLMM, so I can not help you any more.


Ksharp
Super User

Maybe you could use GEE model.

make your data like:

 

date count range

Mar1      0         2  <--[0-2]

Mar1      1        2   <---[2-4]  

..

Mar1     13     10  <-- [10-0]

Mar2     1          2  <--[0-2]

 

 

Make RANGE as offset variable,and use PROC GENMOD or PROC GEE to

model a GEE. Check

PROC GENMOD

Example 44.7: Log-Linear Model for Count Data

 

PROC GEE

Example 43.2: Log-Linear Model for Count Data

 

and also consider to use LSMEAN and ZEROMODEL  statement for zero .

Here is how to compare the differece between two proportion

(i.e. move the OFFSET variable into left side of model)

 

http://support.sas.com/kb/24/188.html 

 

SAS Innovate 2025: Register Now

Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!

What is ANOVA?

ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 4 replies
  • 1697 views
  • 1 like
  • 2 in conversation