02-17-2016 10:37 AM
We are trying to determine if the number of parental deployments (or cumulative length of deployment) is associated with an increase in child mental health related doctor visits. This is a retrospective study where we've identified a cohort of adolescents who have had at least one mental health related doctor visit and whose active duty parent had not yet been deployed. We've collected 10 years of data on this cohort to see if the number of child mental health related doctor visits is related to cumulative deployment length. Included in the analysis will be the covariates: adolescent age, gender and presence of mental health diagnosis and parent age, gender, rank, number of deployments and branch of service.
We are considering using Proc Genmod to analyze the data using the code below. We chose a negative binomial model as over-dispersion appears to be significant. We chose to do GEE analysis using "repeated subject = adol_id(spon_id)" and an exchangeable correlation structure.
proc genmod data=&inpds;
class adol_MH_status (ref='0' param=ref)
adol_gender (ref='0' param=ref)
spon_gender (ref='0' param=ref)
model adol_SS_vis_ct= cumul_length_of_deploy adol_MH_status adol_gender adol_age spon_age spon_gender/dist=nb link=log offset=ln_obs_time ;
repeated subject = adol_id(spon_id)/type=exch WITHINSUBJECT = deploy_status;
A potential problem with this is that over 70% of the cohort does not have a mental health related doctor visit in the 10 year period. Is a GEE analysis problematic in light of this?
02-17-2016 04:00 PM
This looks like a reasonable first approach to this. The problem is that zero-inflated models and repeated measures don't really play well together in any software I know, let alone in SAS. you might consider PROC GEE, and consider the 70% as missing rather than as zeroes. At least, that would be my first attempt beyond this.
02-17-2016 04:28 PM
Thanks for the reply. Your suggestions are always greatly appreciated. I'm not very familiar with analyzing count data. The cohort we selected was composed of adolescents who had had a mental health related visit prior to 2008 and whose parent had not been previously deployed. So we want to determine if mental health related visits increase over time as parents deploy. Do we really want to treat all those that did not have any more mental health related visits as missing?
02-18-2016 07:50 AM
Given your description, missing is probably not the right way to go--those are likely to be true zeroes under the hypothesis you are testing.
Now I would worry about separation, both complete and quasi-, although this is not nearly so much a problem with count data. And with a small mean, a large proportion of zeroes may not be unusual. If it still presents a problem, you may need to switch to GLIMMIX, where you can specify an additional overdispersion for the error term, and from there to NLMIXED, with all of the fun of coding likelihoods. You may want to google "NLMIXED repeated measures zero-inflated" to see if anyone has boldly gone into this area.
02-18-2016 09:25 AM
Thanks again for your valued guidance. This is looking like more fun than I was expecting. Do you see it as problematic to do GEE repeated measures analysis when there are so many zero values?
02-18-2016 09:38 AM
Not so much when it is a Poisson or negative binomial distribution.
I think this is about the only way to handle the repeated nature of the data. Just be prepared for all the fun that might come with it (convergence problems, problems estimating standard errors, arguing with referees/sponsors, etc.).
Have you looked at a non-repeated analysis, where you look at totals over all times?