BookmarkSubscribeRSS Feed
yuping
Calcite | Level 5

Hi,

   I am doing a linear regression. all my independent variables make log transformation and one dependent variable makes boxcox transformation. when I transformed the predict value back to original   scales, I found the predict value exist underestimation. residuals also meet the assumption. Is there anything method can correct the predict data based on the model? I don't want to change my model.

Thanks.

7 REPLIES 7
Ksharp
Super User

You said you make independent variables log transformation,but make dependent variable  boxcox transformation .

They are two different transformation.

I think you only need one transformation.

Ksharp

SteveDenham
Jade | Level 19

Hard to say without more information.  However, in PROC TRANSREG you can transform both the right hand side and left hand side variables--specifying METHOD=MORALS or METHOD=REDUNDANCY may address the underestimation problem.

Steve Denham

ballardw
Super User

Intercepts or no intercepts?

Doc_Duke
Rhodochrosite | Level 12

Steve may have the key for you, but I'll add one general observations.

If you log-transform a dependent variable and then do regression estimation, the anti-log of the estimate will be an estimate of the median in the original scale (not the mean as you might expect). This is well known in economics literature (where they pretty much always have to log-transform the cost dependent variable).  See Duan, Naihua (1983), “Smearing Estimate: A Nonparametric Retransformation Method,” Journal of the American Statistical Association, 78, 605-10, or search for <smearing estimate for log data>.  There are more modern ways for modeling costs, but the underlying issue is the same.

Doc Muhlbaier

Duke

SteveDenham
Jade | Level 19

Thanks, Doc.  I feel like I may have committed a Type 3 error--answered the wrong question correctly. 

The underestimation of the mean by using a log transform followed by a simple exponentiation backtransform is rampant in the biological field as well.  Getting a proper estimate of the standard error of predicted values (LSmeans) is even worse.  At least the documentation for PROC GLIMMIX discusses what kind of backtransform is needed for lognormal distributions.  Still, the geometric mean or median estimate that you get with a simple exponentiation may be more meaningful than the mean--after all, that is part of why the transformation is applied: the data are skewed and the expectation of the variable is "inflated".

Steve Denham

art297
Opal | Level 21

The others who have given you suggestions are, by far, more familiar with statistics than I am.  However, from a management perspective when I receive such questions, I have to ask:  have you looked into the possibility that you have a suppresor variable as the result, possibly, of multicoliniarity?

VX_Xc
Calcite | Level 5

Depending on the boxcox transformation you can interprete your predicted values in meaningful way.

First log transformation of the regressors would result in interpreting coefficient as 1% change in a regressor results in \beta change in outcome variable.

Now it depends on the transformation that was ultimately done to the outcome variable. as boxcox finds optimal power transformation you must look at at which power the outcome variable is transformed at. We just need to back transform it.

               for example if power transform was done at -1, and the confidence interval found for beta values are say .1 and .2 then the resulting interpretation would be that 1% change in the regressor results in change in y value by between 10 and 5 (which are inverse of .1 and .2).

               if power transformation was done at 1/2, then the CI for beta of 2 and 3 can be interpreted as 1% change in the regressor results in change in y value by between 4 and 9.

               and so on.  

OR I have totally misunderstood your question.

SAS Innovate 2025: Save the Date

 SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!

Save the date!

What is ANOVA?

ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 7 replies
  • 3481 views
  • 0 likes
  • 7 in conversation