11-10-2014 02:04 PM
I have data that has been modeled with Proc Genmod using a zero inflated negative binomial model. The original data had 85% zeros. However, when applying the Proc PLM to predict on the original dataset, it appears to only be using the negative binomial model, and ignoring the logistic model to select 0's. The predicted data shows all nonzero values. Am I applying these procs incorrectly?
PROC GENMOD DATA = TESTING.CLAIMS,
MODEL TTD_DAYS_PAID = ODCUM TRAUMA AGE /DIST = ZINB;
ZEROMODEL ODCUM TRAUMA AGE / LINK = LOGIT;
PROC PLM SOURCE=TESTING.ZINBMOD;
SCORE DATA=TESTING.CLAIMS OUT=TESTING.CLAIMSPRED / ILINK;
11-11-2014 03:17 PM
That is the correct specification. Equivalently, but more simply, just use the PRED= option in the OUTPUT statement of PROC GENMOD to get the predicted values. Compare these predicted values to those you get from fitting just a negative binomial model to see the effect of the zero-inflated model. See this note that gives more information.
11-11-2014 03:32 PM
Thanks. I tried this, and I get smaller number for the prediction on the same rows, but still no 0's. I just looked at the item store and saw the note that it only contains the negative binomial portion of the model.
How would I use Proc PLM to evaluate the zero-inflated (binomial) portion of the model, if its definition is not in the store?
Also, using the PRED= option, is this utilizing both models? If so, is there a reason that I still don't get 0's?
Thanks very much!
11-12-2014 01:43 PM
The predicted values from the SCORE statement in PROC PLM, as well as those from the PRED= option in the OUTPUT statement in PROC GENMOD, are the correct predicted values for the fitted zero-inflated negative binomial model. Apparently none of the predictor settings in your SCORE DATA= data set result in predicted values of zero.
11-12-2014 02:45 PM
Proc PLM used the Store variable which houses the parameter estimates. When you view this Store, it explicitly states that it is only the negative binomial portion, and when I checked the parameters, they did not include the logit parameters.
11-13-2014 04:26 PM
Unfortunately the SHOW ALL info produced by PROC PLM does not reflect the fact that the PLM SCORE statement works fully with zero inflated models. Since only the SCORE statement fully supports the ZI models currently, this information has not been updated. But the predicted values from the SCORE statement are correct as evidenced by their agreement with the predicted values from PROC GENMOD.
11-13-2014 04:43 PM
It actually doesn't. The results from proc genmod predict take both models into account. The results from proc plm only apply the poisson or negative binomial model, not the logit model. The predictions are different. Consider this self contained code. It generates a poisson distribution, forces 50% to 0, models and predicts with genmod, and then predicts with plm. The predictions are different. I calculated the results by hand, and plm matched using the poisson coeff's only.
DATA DIST (KEEP=X0-X2);
BETA0 = 1; BETA1 = .2;
DO I = 1 TO 1000;
X0 = RAND("Bernoilli", .5);
X1 = RAND("Uniform");
X2 = RAND("Poisson", EXP((BETA1*X1)+BETA0)) * X0;
PROC UNIVARIATE DATA=DIST NOPRINT;
PROC GENMOD DATA = DIST; /*ZERO INFLATED POISSON*/
MODEL X2 = X1 / DIST = ZIP;
ZEROMODEL X1 / LINK = LOGIT;
OUTPUT OUT=PRED PRED=ESTIMATE;
*ODS OUTPUT PARAMETERESTIMATES=PE MODELFIT=MFIT ZEROPARAMETERESTIMATES=ZEROPE;
PROC PLM SOURCE=MOD;
SCORE DATA=DIST OUT=PRED2 / ILINK;
11-13-2014 09:56 PM
Before we get too far into this, it is possible that it was corrected in an update that my company doesn't have. We run a enterprise edition, so it isn't possible to update frequently. We are running SAS and EG 4.2 and 9.3. I know these aren't the latest versions, but I thought the zero-inflated models were added in 4.2. If you can run my code above and get matching results through either method, then that may not be the case.
11-14-2014 01:26 PM
You'll need SAS 9.4 TS1M1 or the current SAS 9.4 TS1M2. The release of EG is immaterial. Prior releases should issue an error from PROC PLM saying that scoring of zero-inflated models is not available. In the two releases above, predicted results from a zero-inflated model in GENMOD and PLM are identical.