I'm experimenting with proc mcmc in SAS EG to build a zero truncated negative binomial (ZINB) regression for an insurance model that predicts annual health expenses = y based on the prior year's data.
I've referenced the following paper: http://support.sas.com/resources/papers/proceedings13/450-2013.pdf.
Unfortunately SAS EG doesn't support proc fcmp so I can't use the ZINB MCMC code as written in the article, but that shouldn't be a huge obstacle.
My code is below:
ods graphics on;
proc mcmc data=dataset seed=123 nmc=10000 nbi=1000 thin=3 outpost=post_zinb;
parms lbeta0 1.9 lbeta1 -2.6 lbeta2 -4.8 lbeta3 -3.7
pbeta0 6.5 pbeta1 0.8 pbeta2 1.9 pbeta3 1.1 alpha 1;
prior pbeta: ~ normal(0,var=10);
prior lbeta: ~ normal(0,var=10);
prior alpha ~ gamma(2, iscale=2);
link1 = lbeta0 + lbeta1*x1 + lbeta2*x2 + lbeta3*x3; * Logistic regression formula ;
pi = ( 1/(1+exp(-link1))); * Logit transformation of link1 ;
mu = exp(pbeta0 + pbeta1*x1 + pbeta2*x2 + pbeta3*x3); * Negative binomial regression formula in loglink function ;
num_failures = y;
prob_success = mu/(y + mu);
num_successes = 1/alpha;
llike=log(pi*(y eq 0) + (1-pi)*pdf("negbin", num_failures, prob_success, num_successes)); * ZINB log-likelihood function ;
model general(llike);
run;
ods graphics off;
Running the above code yields:
ERROR: Observation 1 yields an invalid log-likelihood value.
NOTE: The SAS System stopped processing this step because of errors.
There are several different ways to parameterize the negative binomial prob. density function - have I incorrectly specified the parameters for the SAS negative binomial pdf function?
Many thanks!
Robert
Figured it out - my log-likelihood and model statements were incorrect. Corrected code is below.
I ended up rewriting programming statements for the log-likelihood, cribbing from the proc nlmixed zero-inflated negative binomial regression code on the UCLA SAS site (http://www.ats.ucla.edu/stat/sas/code/zip_zinb.htm).
ods graphics on;
proc mcmc data=work.ICM_RISK_NYC_FINAL_INTERACTIONS seed=123 nmc=20000 nbi=5000 thin=3 outpost=post_zinb;
parms lbeta0 1.9 lbeta1 -2.6 lbeta2 -4.8 lbeta3 -3.7 lbeta4 0.0
pbeta0 6.5 pbeta1 0.8 pbeta2 1.9 pbeta3 1.1 pbeta4 0.1 alpha 1;
prior pbeta: ~ normal(0,var=5);
prior lbeta: ~ normal(0,var=5);
prior alpha ~ gamma(2, iscale=2);
link1 = lbeta0 + lbeta1*x1 + lbeta2*x2 + lbeta3*x3 + lbeta4*x4; * Logistic regression formula ;
pi = 1/(1+exp(-link1)); * Logit transformation of link1 ;
mu = exp(pbeta0 + pbeta1*x1 + pbeta2*x2 + pbeta3*x3 + pbeta4*x4); * Neg bin regression formula w/ log link ;
disp_factor = 1/alpha;
inv_alpha_mu = 1/(1+alpha*mu);
if y_ceiling = 0 then llike = log(pi + (1-pi)*(inv_alpha_mu**disp_factor));
else if y_ceiling > 0 then llike = log(1-pi) + lgamma(disp_factor + y_ceiling) - lgamma(y_ceiling+1)
- lgamma(disp_factor) + disp_factor*log(inv_alpha_mu) + y_ceiling*log(1-inv_alpha_mu);
* ZINB log-likelihood function ;
model y_ceiling ~ general(llike);
run;
ods graphics off;
Figured it out - my log-likelihood and model statements were incorrect. Corrected code is below.
I ended up rewriting programming statements for the log-likelihood, cribbing from the proc nlmixed zero-inflated negative binomial regression code on the UCLA SAS site (http://www.ats.ucla.edu/stat/sas/code/zip_zinb.htm).
ods graphics on;
proc mcmc data=work.ICM_RISK_NYC_FINAL_INTERACTIONS seed=123 nmc=20000 nbi=5000 thin=3 outpost=post_zinb;
parms lbeta0 1.9 lbeta1 -2.6 lbeta2 -4.8 lbeta3 -3.7 lbeta4 0.0
pbeta0 6.5 pbeta1 0.8 pbeta2 1.9 pbeta3 1.1 pbeta4 0.1 alpha 1;
prior pbeta: ~ normal(0,var=5);
prior lbeta: ~ normal(0,var=5);
prior alpha ~ gamma(2, iscale=2);
link1 = lbeta0 + lbeta1*x1 + lbeta2*x2 + lbeta3*x3 + lbeta4*x4; * Logistic regression formula ;
pi = 1/(1+exp(-link1)); * Logit transformation of link1 ;
mu = exp(pbeta0 + pbeta1*x1 + pbeta2*x2 + pbeta3*x3 + pbeta4*x4); * Neg bin regression formula w/ log link ;
disp_factor = 1/alpha;
inv_alpha_mu = 1/(1+alpha*mu);
if y_ceiling = 0 then llike = log(pi + (1-pi)*(inv_alpha_mu**disp_factor));
else if y_ceiling > 0 then llike = log(1-pi) + lgamma(disp_factor + y_ceiling) - lgamma(y_ceiling+1)
- lgamma(disp_factor) + disp_factor*log(inv_alpha_mu) + y_ceiling*log(1-inv_alpha_mu);
* ZINB log-likelihood function ;
model y_ceiling ~ general(llike);
run;
ods graphics off;
Don't miss out on SAS Innovate - Register now for the FREE Livestream!
Can't make it to Vegas? No problem! Watch our general sessions LIVE or on-demand starting April 17th. Hear from SAS execs, best-selling author Adam Grant, Hot Ones host Sean Evans, top tech journalist Kara Swisher, AI expert Cassie Kozyrkov, and the mind-blowing dance crew iLuminate! Plus, get access to over 20 breakout sessions.
What’s the difference between SAS Enterprise Guide and SAS Studio? How are they similar? Just ask SAS’ Danny Modlin.
Find more tutorials on the SAS Users YouTube channel.