02-28-2018 05:09 PM - edited 02-28-2018 06:55 PM
I have data in which the outcome is a count of events. The events have multiple, let's say three, subtypes, where you might see something like:
Y_total = 4
The counts are mostly small, and there are many zeros.
I have run separate count models for type1, type2, and type3 using negative binomial regression with proc genmod; there are three covariates, x1, x2, and x3, the same in each model. The estimated coefficients for x1 are nominally different across the results for the three models. Using Pearson correlations, I see that the residuals for two pairs of these results are weakly but significantly correlated (not sure if this is the right way of looking at them).
I am interested in whether the coefficients for x1 are statistically significantly different from one another across the pairs of results. Andrew Wheeler here https://andrewpwheeler.wordpress.com/2016/10/19/testing-the-equality-of-two-regression-coefficients/ explains that you can compute the variance of the difference of coefficients and thus construct a significance test. To do this, you need the covariance of the coefficients (which I would rather not assume to be zero).
For linear regressions, this covariance can be got from the results of a seemingly unrelated regression (e.g. with PROC SYSLIN). However, I can't find a way to do this with a count model. (I'm also a bit anxious about the zeros I am introducing by running separate regressions on the counts of the subtypes of the events--not sure if this may be/cause a problem).
Can seemingly unrelated negative binomial regression be done in SAS? Other thoughts and suggestions about this situation?
EDIT: Just for fun, I ran these as linear regressions using PROC SYSLIN by transforming the outcomes
ln(y1+1) = x1 x2 x3
ln(y2+1)= x1 x2 x3
ln(y3+1)= x1 x2 x3
using the COVOUT option for the OUTEST dataset. These equations fit the data hideously, but otherwise are similar to the "independent" count models. However the COVOUT option appears to give the variance-covariance matrix for the parameters *within* each equation, not across them, so maybe I misunderstood what Andrew Wheeler was saying. Heavy sigh.
03-01-2018 09:42 AM
This is an interesting question. It is outside my areas of expertise, but let me share a few thoughts and maybe those with more experience in this area can correct or add to my comments.
I think you should consider whether it is more appropriate to do ONE analysis with a multinomial response, rather than THREE analyses with a count model for each analysis.
I think you should read the papers by Robin High @rhigh on modeling count data by using NLMIXED, NLIN, COUNTREG, FMM, and more. I don't know the best approach for your data, but I would start by looking at the following:
In addition to Robin High's papers, you might be interested in the work of Jorge Morel. He gave a wonderful SAS Global Forum 2014 talk on overdispersion in count models and has written a book Overdispersion Models in SAS. Chapter 7 deals with multinomial responses.