I am afraid that I have never seen a SASSTAT proc that addresses all 3 of your concerns (ZINB distribution, random effect, survey weights), but there may be a way.if you have a license to SAS/ETS. The COUNTREG procedure can do zero-inflated count distributions and fixed and random effects, but I am not sure it can do both at once.and the random effect at least in the details section looks like it only applies to panel data (measures over time). So, you might consider the following plan. See if it makes sense.
Your final analysis would be done in GENMOD with a zero-inflated negative binomial distribution (see the documentation on how to do this). GLIMMIX doesn't do a good job on this unless you program the link and deviance functions yourself to include the logit that accounts for the zero inflation probability.
To get adequate weights will probably require some programming statements along the way. To get started on the weights, look through the documenation for PROC SURVEYMEANS. There is code in there that could generate survey weights (provided you have population numbers and number of observations in each category of the survey).
That leaves the random effect. You say your sample of hospitals is about 10,000. To someone used to small sample size analyses, that seems like a lot. How closely does it approximate the population size? If it is fairly close (and 'fairly close' is poorly defined) to the population size, you won't hurt the analysis much by not considering the random effect, and just fitting HOSP_NIS as a fixed effect.
Now you are in a position to use GENMOD. If there is still a problem with fitting along the Model too large lines, consider which of the variables are "noise absorbers" and which are not. If they are not, then delete them from the MODEL statement. I will grant that you may not know this before the analysis. A good way to find out might be to look at boxplots for the levels of the candidate variables. If they are relatively uniform with respect to the other variables, then they aren't adding much in the way of information to the model.
I realize this is kind of all over the place, but this is the first mash-up of techniques that cover most of your concerns.
SteveDenham
Thank you much, @SteveDenham, for a detailed response.
I have tried both COUNTREG & GENMOD, but none of them gave me a chance to incorporate random effects for ZINB distribution (data is not a panel data). Good thing is that I already have a weighting variable (DISCWT) provided with the dataset, so no need to generate it.
# of hospitals is large because this is the 13 years' pooled data (less than 1,000 hospital records each year), which represents around 20% of U.S. community hospitals. Do you think this is large enough to be considered as 'fairly close' to population size? I am afraid if I can remove hospital ID (HOSP_NIS) from random effects since discharge records are largely varied across the types/size of the hospitals. Moreover, there are multiple hospitalizations by the same patients (unfortunately, data do not have patient identifiers). By the way, what did you mean by using HOSP_NIS in the fixed effects model? Using it as a covariate, or just remove it from the model?
Besides, I have run another model for a continuous (normally distributed) dependent variable (logged costs) using PROC GENMOD. The model worked fine when running with a 1% sample, but kept running for 40 hours (then I canceled) when used full sample (around 70 million). The codes were like that:
proc genmod data=nis2.nis_2003_15N02 ; /* final costs model*/
class HOSP_NIS YEAR(ref=first) FEMALE(ref=first) RACEcat (ref="white") PAYER1(ref="Private_") PL_UR4(ref="Large Metro") ZIPINC_QRTL(ref=first) AWEEKEND(ref=first) ELECTIVE(ref=first) HOSP_BEDSIZE(ref=first) HOSP_LOCTEACH (ref=first) HOSP_REGION(ref=first);
model COSTS02_log = HIV|Age_c10 YEAR FEMALE RACEcat PAYER1 PL_UR4 ZIPINC_QRTL AWEEKEND ELECTIVE HOSP_BEDSIZE HOSP_LOCTEACH HOSP_REGION / dist=gamma link=log ALPHA=0.01;
repeated subject=HOSP_NIS / type=exch;
weight DISCWT;
run;
Is the issue with data size only, or I need to use any type of optimization technique when running the model with the full sample?
Can you explain a little more how can I identify "noise absorbers" variables (given that all, but age, independent variables are categorical)? And why I need to do this?
Going back to the ZINB mixed model, repeated (/random) statements are not working in GENMOD when used 'dist=zinb'. So, would you suggest anything else? I am not an expert with all these, so writing programs is not an option for me! 😞
I need at least an NB mixed-effect model for my other two dependent variables that do not have zero-inflation (# of diagnosis, length of stays).
Thanks again for your help.
Judging from your replies to several issues, I think I would recommend:
ZINB using PROC GENMOD using the weighting you mention, but including HOSP_NIS as an effect in the model statement, and removing the repeated/random approach.
The other thing is the size of the dataset and the complexity of the model. Have you tried running a model on a random subset of the data, and seeing how long that takes? If it helps a lot to fit 1% of your data, then consider model averaging as an approach. Get 100 or so random samples and fit them separately. Save the results for the parameters, then resample those to get an averaged value and standard error.
SteveDenham
There seems to be a solution now. You can check this paper, which is an introduction to the %SURVEYGENMOD macro, a macro that is capable of building zero-inflated Poisson and zero-inflated negative binomial models. However, I am not sure if random effects can be included in the models. I have not read this paper carefully.
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.
Find more tutorials on the SAS Users YouTube channel.