BookmarkSubscribeRSS Feed
cjohnson
Obsidian | Level 7

I have data that has been modeled with Proc Genmod using a zero inflated negative binomial model.  The original data had 85% zeros. However, when applying the Proc PLM to predict on the original dataset, it appears to only be using the negative binomial model, and ignoring the logistic model to select 0's.  The predicted data shows all nonzero values.  Am I applying these procs incorrectly?

Thanks,

Chris

PROC GENMOD DATA = TESTING.CLAIMS,

MODEL TTD_DAYS_PAID = ODCUM TRAUMA AGE /DIST = ZINB;

ZEROMODEL ODCUM TRAUMA AGE / LINK = LOGIT;

STORE OUT=TESTING.ZINBMOD;

RUN;

PROC PLM SOURCE=TESTING.ZINBMOD;

SCORE DATA=TESTING.CLAIMS OUT=TESTING.CLAIMSPRED / ILINK;

RUN;

Christopher Johnson
www.codeitmagazine.com
9 REPLIES 9
StatDave
SAS Super FREQ

That is the correct specification.  Equivalently, but more simply, just use the PRED= option in the OUTPUT statement of PROC GENMOD to get the predicted values.  Compare these predicted values to those you get from fitting just a negative binomial model to see the effect of the zero-inflated model. See this note that gives more information.

cjohnson
Obsidian | Level 7

Thanks.  I tried this, and I get smaller number for the prediction on the same rows, but still no 0's.  I just looked at the item store and saw the note that it only contains the negative binomial portion of the model.

How would I use Proc PLM to evaluate the zero-inflated (binomial) portion of the model, if its definition is not in the store?

Also, using the PRED= option, is this utilizing both models?  If so, is there a reason that I still don't get 0's?

Thanks very much!

Christopher Johnson
www.codeitmagazine.com
StatDave
SAS Super FREQ

The predicted values from the SCORE statement in PROC PLM, as well as those from the PRED= option in the OUTPUT statement in PROC GENMOD, are the correct predicted values for the fitted zero-inflated negative binomial model.  Apparently none of the predictor settings in your SCORE DATA= data set result in predicted values of zero.

cjohnson
Obsidian | Level 7

Proc PLM used the Store variable which houses the parameter estimates.  When you view this Store, it explicitly states that it is only the negative binomial portion, and when I checked the parameters, they did not include the logit parameters.

Christopher Johnson
www.codeitmagazine.com
StatDave
SAS Super FREQ

Unfortunately the SHOW ALL info produced by PROC PLM does not reflect the fact that the PLM SCORE statement works fully with zero inflated models.  Since only the SCORE statement fully supports the ZI models currently, this information has not been updated.  But the predicted values from the SCORE statement are correct as evidenced by their agreement with the predicted values from PROC GENMOD.

cjohnson
Obsidian | Level 7

It actually doesn't.  The results from proc genmod predict take both models into account.  The results from proc plm only apply the poisson or negative binomial model, not the logit model.  The predictions are different.  Consider this self contained code.  It generates a poisson distribution, forces 50% to 0, models and predicts with genmod, and then predicts with plm.  The predictions are different.  I calculated the results by hand, and plm matched using the poisson coeff's only.

DATA DIST (KEEP=X0-X2);

CALL STREAMINIT(4321);

BETA0 = 1; BETA1 = .2;

DO I = 1 TO 1000;

  X0 = RAND("Bernoilli", .5);

  X1 = RAND("Uniform");

  X2 = RAND("Poisson", EXP((BETA1*X1)+BETA0)) * X0;

OUTPUT;

END;

RUN;

PROC UNIVARIATE DATA=DIST NOPRINT;

HISTOGRAM X1-X2;

RUN;

PROC GENMOD DATA = DIST; /*ZERO INFLATED POISSON*/

MODEL X2 = X1 / DIST = ZIP;

ZEROMODEL X1 / LINK = LOGIT;

STORE OUT=MOD;

OUTPUT OUT=PRED PRED=ESTIMATE;

*ODS OUTPUT PARAMETERESTIMATES=PE MODELFIT=MFIT ZEROPARAMETERESTIMATES=ZEROPE;

RUN;

PROC PLM SOURCE=MOD;

SCORE DATA=DIST OUT=PRED2 / ILINK;

RUN;

Christopher Johnson
www.codeitmagazine.com
cjohnson
Obsidian | Level 7

Before we get too far into this, it is possible that it was corrected in an update that my company doesn't have.  We run a enterprise edition, so it isn't possible to update frequently.  We are running SAS and EG 4.2 and 9.3.  I know these aren't the latest versions, but I thought the zero-inflated models were added in 4.2.  If you can run my code above and get matching results through either method, then that may not be the case.

Christopher Johnson
www.codeitmagazine.com
StatDave
SAS Super FREQ

You'll need SAS 9.4 TS1M1 or the current SAS 9.4 TS1M2.  The release of EG is immaterial.  Prior releases should issue an error from PROC PLM saying that scoring of zero-inflated models is not available.  In the two releases above, predicted results from a zero-inflated model in GENMOD and PLM are identical.

cjohnson
Obsidian | Level 7

Thanks.  I was not getting any error or warning message, so I was not aware.  I work toward getting the upgrade.

Christopher Johnson
www.codeitmagazine.com

SAS Innovate 2025: Save the Date

 SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!

Save the date!

What is ANOVA?

ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 9 replies
  • 2991 views
  • 0 likes
  • 2 in conversation