08-08-2017 03:52 PM - edited 08-08-2017 03:53 PM
I see that PROC TRANSREG can be used to estimate the optimal lambda for the Box-Cox transformation of a dependent variable, however I'm wondering how I might extend such estimation to the two-parameter variant of Box-Cox (as seen here). My motivation is the presence of a significant number of zeros present among the dependent variable, which prohibits the use of the simpler one-parameter Box-Cox transformation.
While PROC TRANSREG has the option of incorporating a lambda2 (i.e. shift) parameter, it takes the form of an arbitrary value hard-coded into the transformation--it cannot be solved for like lambda1. Is there an alternative procedure that can solve for both lambdas?
I've attempted to do something of the sort using PROC MCMC, however I've encountered a bit of a snag when it comes to incorporating lambda2 into the model. I'm basing my code off of the Box-Cox example in SAS's PROC MCMC documentation, which can be found here. The one-parameter Box-Cox transformation is solved using the following code:
proc mcmc data=boxcox nmc=50000 thin=10 propcov=quanew seed=12567 monitor=(lamda); parms beta0 0 beta1 0 lamda 1 s2 1; beginnodata; prior beta: ~ general(0); prior s2 ~ gamma(shape=3, scale=2); prior lamda ~ unif(-2,2); sd = sqrt(s2); endnodata; ys = (y**lamda-1)/lamda; mu = beta0+beta1*x; ll = (lamda-1)*log(y)+lpdfnorm(ys, mu, sd); model general(ll); run;
Now, I feel like I should be able to solve for both lambdas in the two-parameter version of Box-Cox by declaring a prior for lambda2 and incorporating lambda2 into the equations at the bottom as follows:
proc mcmc data=boxcox nmc=50000 thin=10 propcov=quanew seed=12567 monitor=(lamda lambda2); parms beta0 0 beta1 0 lamda 1 lambda2 1 s2 1; beginnodata; prior beta: ~ general(0); prior s2 ~ gamma(shape=3, scale=2); prior lamda ~ unif(-2,2); prior lamda2 ~ unif(-2,2); sd = sqrt(s2); endnodata; ys = ((y+lambda2)**lamda-1)/lamda; mu = beta0+beta1*x; ll = (lamda-1)*log(y+lambda2)+lpdfnorm(ys, mu, sd); model general(ll); run;
However when I do this, the Markov chain does not converge, so much so that lambda2 would appear to be completely irrelevant to the model. This prevents lambda1 from converging as well, and the solution is useless. If I replace the variable lambda2 with a value rather than treat it as a variable to be solved, lambda1 is solved splendidly, with a nice little normal curve histogram that agrees with the solution delivered by PROC TRANSREG under the same conditions. Can anyone explain what I'm doing wrong? Clearly there are values for lambda2 that support the Box-Cox transformation, but I can't figure out how to solve for both lambda1 and lambda2 simultaneously.
Thanks in advance.