Programming the statistical procedures from SAS

linear regression underestimation

Posts: 1

linear regression underestimation


   I am doing a linear regression. all my independent variables make log transformation and one dependent variable makes boxcox transformation. when I transformed the predict value back to original   scales, I found the predict value exist underestimation. residuals also meet the assumption. Is there anything method can correct the predict data based on the model? I don't want to change my model.


Super User
Posts: 9,779

linear regression underestimation

You said you make independent variables log transformation,but make dependent variable  boxcox transformation .

They are two different transformation.

I think you only need one transformation.


Respected Advisor
Posts: 2,655

linear regression underestimation

Hard to say without more information.  However, in PROC TRANSREG you can transform both the right hand side and left hand side variables--specifying METHOD=MORALS or METHOD=REDUNDANCY may address the underestimation problem.

Steve Denham

Super User
Posts: 10,875

linear regression underestimation

Intercepts or no intercepts?

Trusted Advisor
Posts: 2,114

linear regression underestimation

Steve may have the key for you, but I'll add one general observations.

If you log-transform a dependent variable and then do regression estimation, the anti-log of the estimate will be an estimate of the median in the original scale (not the mean as you might expect). This is well known in economics literature (where they pretty much always have to log-transform the cost dependent variable).  See Duan, Naihua (1983), “Smearing Estimate: A Nonparametric Retransformation Method,” Journal of the American Statistical Association, 78, 605-10, or search for <smearing estimate for log data>.  There are more modern ways for modeling costs, but the underlying issue is the same.

Doc Muhlbaier


Respected Advisor
Posts: 2,655

linear regression underestimation

Thanks, Doc.  I feel like I may have committed a Type 3 error--answered the wrong question correctly. 

The underestimation of the mean by using a log transform followed by a simple exponentiation backtransform is rampant in the biological field as well.  Getting a proper estimate of the standard error of predicted values (LSmeans) is even worse.  At least the documentation for PROC GLIMMIX discusses what kind of backtransform is needed for lognormal distributions.  Still, the geometric mean or median estimate that you get with a simple exponentiation may be more meaningful than the mean--after all, that is part of why the transformation is applied: the data are skewed and the expectation of the variable is "inflated".

Steve Denham

Posts: 7,417

linear regression underestimation

The others who have given you suggestions are, by far, more familiar with statistics than I am.  However, from a management perspective when I receive such questions, I have to ask:  have you looked into the possibility that you have a suppresor variable as the result, possibly, of multicoliniarity?

Posts: 53

Re: linear regression underestimation

Depending on the boxcox transformation you can interprete your predicted values in meaningful way.

First log transformation of the regressors would result in interpreting coefficient as 1% change in a regressor results in \beta change in outcome variable.

Now it depends on the transformation that was ultimately done to the outcome variable. as boxcox finds optimal power transformation you must look at at which power the outcome variable is transformed at. We just need to back transform it.

               for example if power transform was done at -1, and the confidence interval found for beta values are say .1 and .2 then the resulting interpretation would be that 1% change in the regressor results in change in y value by between 10 and 5 (which are inverse of .1 and .2).

               if power transformation was done at 1/2, then the CI for beta of 2 and 3 can be interpreted as 1% change in the regressor results in change in y value by between 4 and 9.

               and so on.  

OR I have totally misunderstood your question.

Ask a Question
Discussion stats
  • 7 replies
  • 7 in conversation