BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
Yazdan
Fluorite | Level 6

Hi all,

 

I have a problem with the dbI'm working on. I want to compare the mean value of two variable. I can't use t-test since the variables are highly zero-inflated. The variables represent the number of security breaches over different time blocks that were recorded over a period of 4 years (Ex. Variable X1 and X2 represent the number of security breaches occurred in (7pm-9pm) & (10pm-12pm) blocks). I want to compare the mean value of these variables. I think I can't use Chi-square proportion test as well since most of the recorded data have the value of <5. In addition, these variables suffer from overdispersion problem, as well (Ex. Mean=8.36 & Variance=331.66). What should I do in this case? Thank you!

 

Regards,

Yazdan 

1 ACCEPTED SOLUTION
4 REPLIES 4
Yazdan
Fluorite | Level 6

Thank you for your response. I also have another problem. Please consider my example: I have a table for the number of incidents occurred in two-hour blocks (Ex. 0-2, 2-4, etc). The data for each block recorded over 4 years and it is heavily inflated with zeros. So I have a table like:

 

Day                     [0-2]         [2-4]        [4-6]    .......   [10-0]

Match 1                  0             1             1       .......      13         

March 2                  1             2             0       ........       2  

.

.

.

.

March 30              0            10             0                      2

 

 

How can I compare the proportion of the number of attacks occurred during different time blocks in March? Please give me a hand in coding as well. Thank you!

 

 

Yazdan

Ksharp
Super User
From my opinion, maybe you need GLMM model.
Check PROC GLIMMIX.

I am not expert about GLMM, so I can not help you any more.


Ksharp
Super User

Maybe you could use GEE model.

make your data like:

 

date count range

Mar1      0         2  <--[0-2]

Mar1      1        2   <---[2-4]  

..

Mar1     13     10  <-- [10-0]

Mar2     1          2  <--[0-2]

 

 

Make RANGE as offset variable,and use PROC GENMOD or PROC GEE to

model a GEE. Check

PROC GENMOD

Example 44.7: Log-Linear Model for Count Data

 

PROC GEE

Example 43.2: Log-Linear Model for Count Data

 

and also consider to use LSMEAN and ZEROMODEL  statement for zero .

Here is how to compare the differece between two proportion

(i.e. move the OFFSET variable into left side of model)

 

http://support.sas.com/kb/24/188.html 

 

sas-innovate-2024.png

Don't miss out on SAS Innovate - Register now for the FREE Livestream!

Can't make it to Vegas? No problem! Watch our general sessions LIVE or on-demand starting April 17th. Hear from SAS execs, best-selling author Adam Grant, Hot Ones host Sean Evans, top tech journalist Kara Swisher, AI expert Cassie Kozyrkov, and the mind-blowing dance crew iLuminate! Plus, get access to over 20 breakout sessions.

 

Register now!

What is ANOVA?

ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 4 replies
  • 1379 views
  • 1 like
  • 2 in conversation