02-13-2012 09:15 PM
I am doing a linear regression. all my independent variables make log transformation and one dependent variable makes boxcox transformation. when I transformed the predict value back to original scales, I found the predict value exist underestimation. residuals also meet the assumption. Is there anything method can correct the predict data based on the model? I don't want to change my model.
02-14-2012 02:02 AM
You said you make independent variables log transformation,but make dependent variable boxcox transformation .
They are two different transformation.
I think you only need one transformation.
02-14-2012 06:33 AM
Hard to say without more information. However, in PROC TRANSREG you can transform both the right hand side and left hand side variables--specifying METHOD=MORALS or METHOD=REDUNDANCY may address the underestimation problem.
02-14-2012 08:41 AM
Steve may have the key for you, but I'll add one general observations.
If you log-transform a dependent variable and then do regression estimation, the anti-log of the estimate will be an estimate of the median in the original scale (not the mean as you might expect). This is well known in economics literature (where they pretty much always have to log-transform the cost dependent variable). See Duan, Naihua (1983), “Smearing Estimate: A Nonparametric Retransformation Method,” Journal of the American Statistical Association, 78, 605-10, or search for <smearing estimate for log data>. There are more modern ways for modeling costs, but the underlying issue is the same.
02-15-2012 08:01 AM
Thanks, Doc. I feel like I may have committed a Type 3 error--answered the wrong question correctly.
The underestimation of the mean by using a log transform followed by a simple exponentiation backtransform is rampant in the biological field as well. Getting a proper estimate of the standard error of predicted values (LSmeans) is even worse. At least the documentation for PROC GLIMMIX discusses what kind of backtransform is needed for lognormal distributions. Still, the geometric mean or median estimate that you get with a simple exponentiation may be more meaningful than the mean--after all, that is part of why the transformation is applied: the data are skewed and the expectation of the variable is "inflated".
02-15-2012 07:29 PM
The others who have given you suggestions are, by far, more familiar with statistics than I am. However, from a management perspective when I receive such questions, I have to ask: have you looked into the possibility that you have a suppresor variable as the result, possibly, of multicoliniarity?
02-16-2012 07:56 PM
Depending on the boxcox transformation you can interprete your predicted values in meaningful way.
First log transformation of the regressors would result in interpreting coefficient as 1% change in a regressor results in \beta change in outcome variable.
Now it depends on the transformation that was ultimately done to the outcome variable. as boxcox finds optimal power transformation you must look at at which power the outcome variable is transformed at. We just need to back transform it.
for example if power transform was done at -1, and the confidence interval found for beta values are say .1 and .2 then the resulting interpretation would be that 1% change in the regressor results in change in y value by between 10 and 5 (which are inverse of .1 and .2).
if power transformation was done at 1/2, then the CI for beta of 2 and 3 can be interpreted as 1% change in the regressor results in change in y value by between 4 and 9.
and so on.
OR I have totally misunderstood your question.