Thank you much, @SteveDenham, for a detailed response. I have tried both COUNTREG & GENMOD, but none of them gave me a chance to incorporate random effects for ZINB distribution (data is not a panel data). Good thing is that I already have a weighting variable (DISCWT) provided with the dataset, so no need to generate it. # of hospitals is large because this is the 13 years' pooled data (less than 1,000 hospital records each year), which represents around 20% of U.S. community hospitals. Do you think this is large enough to be considered as 'fairly close' to population size? I am afraid if I can remove hospital ID (HOSP_NIS) from random effects since discharge records are largely varied across the types/size of the hospitals. Moreover, there are multiple hospitalizations by the same patients (unfortunately, data do not have patient identifiers). By the way, what did you mean by using HOSP_NIS in the fixed effects model? Using it as a covariate, or just remove it from the model? Besides, I have run another model for a continuous (normally distributed) dependent variable (logged costs) using PROC GENMOD. The model worked fine when running with a 1% sample, but kept running for 40 hours (then I canceled) when used full sample (around 70 million). The codes were like that: proc genmod data=nis2.nis_2003_15N02 ; /* final costs model*/ class HOSP_NIS YEAR(ref=first) FEMALE(ref=first) RACEcat (ref="white") PAYER1(ref="Private_") PL_UR4(ref="Large Metro") ZIPINC_QRTL(ref=first) AWEEKEND(ref=first) ELECTIVE(ref=first) HOSP_BEDSIZE(ref=first) HOSP_LOCTEACH (ref=first) HOSP_REGION(ref=first); model COSTS02_log = HIV|Age_c10 YEAR FEMALE RACEcat PAYER1 PL_UR4 ZIPINC_QRTL AWEEKEND ELECTIVE HOSP_BEDSIZE HOSP_LOCTEACH HOSP_REGION / dist=gamma link=log ALPHA=0.01; repeated subject=HOSP_NIS / type=exch; weight DISCWT; run; Is the issue with data size only, or I need to use any type of optimization technique when running the model with the full sample? Can you explain a little more how can I identify "noise absorbers" variables (given that all, but age, independent variables are categorical)? And why I need to do this? Going back to the ZINB mixed model, repeated (/random) statements are not working in GENMOD when used 'dist=zinb'. So, would you suggest anything else? I am not an expert with all these, so writing programs is not an option for me! 😞 I need at least an NB mixed-effect model for my other two dependent variables that do not have zero-inflation (# of diagnosis, length of stays). Thanks again for your help.
... View more