I'm experimenting with proc mcmc in SAS EG to build a zero truncated negative binomial (ZINB) regression for an insurance model that predicts annual health expenses = y based on the prior year's data.
I've referenced the following paper: http://support.sas.com/resources/papers/proceedings13/450-2013.pdf.
Unfortunately SAS EG doesn't support proc fcmp so I can't use the ZINB MCMC code as written in the article, but that shouldn't be a huge obstacle.
My code is below:
ods graphics on;
proc mcmc data=dataset seed=123 nmc=10000 nbi=1000 thin=3 outpost=post_zinb;
parms lbeta0 1.9 lbeta1 -2.6 lbeta2 -4.8 lbeta3 -3.7
pbeta0 6.5 pbeta1 0.8 pbeta2 1.9 pbeta3 1.1 alpha 1;
prior pbeta: ~ normal(0,var=10);
prior lbeta: ~ normal(0,var=10);
prior alpha ~ gamma(2, iscale=2);
link1 = lbeta0 + lbeta1*x1 + lbeta2*x2 + lbeta3*x3; * Logistic regression formula ;
pi = ( 1/(1+exp(-link1))); * Logit transformation of link1 ;
mu = exp(pbeta0 + pbeta1*x1 + pbeta2*x2 + pbeta3*x3); * Negative binomial regression formula in loglink function ;
num_failures = y;
prob_success = mu/(y + mu);
num_successes = 1/alpha;
llike=log(pi*(y eq 0) + (1-pi)*pdf("negbin", num_failures, prob_success, num_successes)); * ZINB log-likelihood function ;
model general(llike);
run;
ods graphics off;
Running the above code yields:
ERROR: Observation 1 yields an invalid log-likelihood value.
NOTE: The SAS System stopped processing this step because of errors.
There are several different ways to parameterize the negative binomial prob. density function - have I incorrectly specified the parameters for the SAS negative binomial pdf function?
Many thanks!
Robert
Figured it out - my log-likelihood and model statements were incorrect. Corrected code is below.
I ended up rewriting programming statements for the log-likelihood, cribbing from the proc nlmixed zero-inflated negative binomial regression code on the UCLA SAS site (http://www.ats.ucla.edu/stat/sas/code/zip_zinb.htm).
ods graphics on;
proc mcmc data=work.ICM_RISK_NYC_FINAL_INTERACTIONS seed=123 nmc=20000 nbi=5000 thin=3 outpost=post_zinb;
parms lbeta0 1.9 lbeta1 -2.6 lbeta2 -4.8 lbeta3 -3.7 lbeta4 0.0
pbeta0 6.5 pbeta1 0.8 pbeta2 1.9 pbeta3 1.1 pbeta4 0.1 alpha 1;
prior pbeta: ~ normal(0,var=5);
prior lbeta: ~ normal(0,var=5);
prior alpha ~ gamma(2, iscale=2);
link1 = lbeta0 + lbeta1*x1 + lbeta2*x2 + lbeta3*x3 + lbeta4*x4; * Logistic regression formula ;
pi = 1/(1+exp(-link1)); * Logit transformation of link1 ;
mu = exp(pbeta0 + pbeta1*x1 + pbeta2*x2 + pbeta3*x3 + pbeta4*x4); * Neg bin regression formula w/ log link ;
disp_factor = 1/alpha;
inv_alpha_mu = 1/(1+alpha*mu);
if y_ceiling = 0 then llike = log(pi + (1-pi)*(inv_alpha_mu**disp_factor));
else if y_ceiling > 0 then llike = log(1-pi) + lgamma(disp_factor + y_ceiling) - lgamma(y_ceiling+1)
- lgamma(disp_factor) + disp_factor*log(inv_alpha_mu) + y_ceiling*log(1-inv_alpha_mu);
* ZINB log-likelihood function ;
model y_ceiling ~ general(llike);
run;
ods graphics off;
Figured it out - my log-likelihood and model statements were incorrect. Corrected code is below.
I ended up rewriting programming statements for the log-likelihood, cribbing from the proc nlmixed zero-inflated negative binomial regression code on the UCLA SAS site (http://www.ats.ucla.edu/stat/sas/code/zip_zinb.htm).
ods graphics on;
proc mcmc data=work.ICM_RISK_NYC_FINAL_INTERACTIONS seed=123 nmc=20000 nbi=5000 thin=3 outpost=post_zinb;
parms lbeta0 1.9 lbeta1 -2.6 lbeta2 -4.8 lbeta3 -3.7 lbeta4 0.0
pbeta0 6.5 pbeta1 0.8 pbeta2 1.9 pbeta3 1.1 pbeta4 0.1 alpha 1;
prior pbeta: ~ normal(0,var=5);
prior lbeta: ~ normal(0,var=5);
prior alpha ~ gamma(2, iscale=2);
link1 = lbeta0 + lbeta1*x1 + lbeta2*x2 + lbeta3*x3 + lbeta4*x4; * Logistic regression formula ;
pi = 1/(1+exp(-link1)); * Logit transformation of link1 ;
mu = exp(pbeta0 + pbeta1*x1 + pbeta2*x2 + pbeta3*x3 + pbeta4*x4); * Neg bin regression formula w/ log link ;
disp_factor = 1/alpha;
inv_alpha_mu = 1/(1+alpha*mu);
if y_ceiling = 0 then llike = log(pi + (1-pi)*(inv_alpha_mu**disp_factor));
else if y_ceiling > 0 then llike = log(1-pi) + lgamma(disp_factor + y_ceiling) - lgamma(y_ceiling+1)
- lgamma(disp_factor) + disp_factor*log(inv_alpha_mu) + y_ceiling*log(1-inv_alpha_mu);
* ZINB log-likelihood function ;
model y_ceiling ~ general(llike);
run;
ods graphics off;
Good news: We've extended SAS Hackathon registration until Sept. 12, so you still have time to be part of our biggest event yet – our five-year anniversary!
Check out this tutorial series to learn how to build your own steps in SAS Studio.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.