I have data, number of persons killed in road accident, collected in one state (all 10 administrative divisions), every single day, between November 1st to march 31st, for 16 years (from 2005 to 2020). I am running a population average model (Poisson, negative binomial for count data).
My interest is to model the impact of low temperature (I have daily average temp from November, 1st 2005 to march, 31st 2020) on number of persons killed.
Primarily, I would like to get an overall effect (Incidence Rate Ratio) at state level, but I am also interested in at each division effect. Intuitively, I considered running GEE models with year nested in division as cluster. So in order to get these effects, I run two separate models, as below.
The question I have, is it correct to have a clustering variable in both parts, i.e. in repeated and model parts?
Thanks in advance
proc genmod data=claims ;class x1 year division;
model claims = Temp x1 x2 year /offset=log_workforce dist=nb link= log;
estimate "State" Temp 1 / exp;
repeated subject=year(division)/type=ar;
run;
proc genmod data=claims ;class x1 year division;
model claims = Temp|division x1 x2 year/ offset=log_workforce dist=nb link= log;
repeated subject=year(division)/type=ar;
estimate "div 1" Temp 1 Temp*division 1 0 0 0 0 0 0 0 0 0/ exp;
estimate "div 2" Temp 1 Temp*division 0 1 0 0 0 0 0 0 0 0/ exp;
estimate "div 10" Temp 1 Temp*division 0 0 0 0 0 0 0 0 0 1/ exp;
run;
If you observe a response measure each day and consider those measures to be correlated within clusters defined by division, then you would specify SUBJECT=DIVISION in the REPEATED statement as described in detail in the first note I referred to earlier. You can then use the MODEL and ESTIMATE statements in your first GENMOD step to get the overall state RR. You can use the same REPEATED statement in your second GENMOD step, and an LSMEANS statement as shown in the second note I referred to earlier, to get the separate division RRs.
In your description, you say that your data is from one state. If that is true, then it isn't possible to get a "state IRR" since there isn't a second state to compare their rates in a ratio. Concerning the clustering, see this note. As discussed there, the SUBJECT= specification only determines which observations are in a correlated cluster. It isn't part of the model specification. If the observations in your data set are individual accidents and the response is the number of people killed in each of those accidents and you consider that these numbers are correlated within the levels of some variable (division? state - if there really are multiple states?) then specify that variable in SUBJECT=. The note above describes when a nested effect would be needed, which it rarely is. See this note on estimating rates and rate ratio which might help. As shown there, you don't need ESTIMATE statements and it is recommended that you avoid using the ESTIMATE statement since properly determining the coefficients to use is an error prone process.Usually, the LSMEANS, SLICE, or LSMESTIMATE statement is easier to use.
Hi StatDave_sas,
Thanks for your response. Much appreciated for the links.
I think I incorrectly stated «Overall state IRR » . Indeed there is one state (which does not appear in the model), with 10 administrative divisions (whose I considered as cluster). Over the specified period, for each day, in each administrative division, we have the count of persons killed in traffic and the daily mean temperature. So I am modeling the mean daily count of persons killed as function of mean daily temperature.
Now, by « overall state IRR », I would like to get the effect for a 1 unit increase in the mean daily temperature for the entire state, as depicted in the first genmod.
Likewise, to get the IRR at division level, e.i. the effect for a 1 unit increase in the mean daily temperature for the each division, as depicted in the second genmod. At this point, I came to include the temperature by division interaction in the model in order to get the IRR, but also in the repeated part, being defined as cluster.
Now, I am not sure I can get an effect with lsmeans for a single continous variable, like in model 1, unless it involved in an interaction with a categorical variable.
Thanks again for your help, I will appreciate any input,
If you observe a response measure each day and consider those measures to be correlated within clusters defined by division, then you would specify SUBJECT=DIVISION in the REPEATED statement as described in detail in the first note I referred to earlier. You can then use the MODEL and ESTIMATE statements in your first GENMOD step to get the overall state RR. You can use the same REPEATED statement in your second GENMOD step, and an LSMEANS statement as shown in the second note I referred to earlier, to get the separate division RRs.
Sorry... I was thinking of comparing division rates in your second model which motivated using an LSMEANS statement. That won't work for the task of estimating the division rate ratios for the temp change. You will need to stick with ESTIMATE statements for that purpose.
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.
Find more tutorials on the SAS Users YouTube channel.