🔒 This topic is **solved** and **locked**.
Posted 10-05-2018 03:52 PM
(1918 views)

Hi,

I have a question regarding to the variable transformation by using Box-Cox.

I'm a student taking Regression Analysis class, and here is the code for the example:

```
data plasma;
input age plevel;
datalines;
0 13.44
0 12.84
0 11.91
0 20.09
0 15.60
1.0 10.11
1.0 11.38
1.0 10.28
1.0 8.96
1.0 8.59
2.0 9.83
2.0 9.00
2.0 8.65
2.0 7.85
2.0 8.88
3.0 7.94
3.0 6.01
3.0 5.14
3.0 6.90
3.0 6.77
4.0 4.86
4.0 5.10
4.0 5.67
4.0 5.75
4.0 6.23
;
run;
proc reg data=plasma;
model plevel=age;
run;
ods output boxcox=bc details=details;
proc transreg data=plasma PBOXCOXTABLE detail;
model boxcox(plevel/ lambda= -1.2 to 1.2 by 0.1 convenient)
= identity(age);
output out = bc_plasma;
run;
proc print data=bc_plasma;
run;
proc reg data=bc_plasma;
model tplevel=age;
run;
```

so the best lambda for the transformation is -0.50. I have verified and get the same lambda by manually calculating the BoxCox formula in R.

However, I am wondering where is the new variable Y transformation is actually calculated from?

From my textbook, after getting lambda=-0.50. Then the Y-transformation is going to be Y^(-0.50)

So, saying the first observation Y=13.44, the Y-transformation by using lambda=-0.5 is 0.2727

But in output is 2.59823, and is not from (Y^(-0.50) -1)/(-0.5).

I am hesitate on whether directly use this output variable as the Y transformation to fit the regression line because I don't know how it is calculated.

Can anyone explain ?

Thanks!

Thank you so much!!

Now I got the correct Y transformation output for the Best Lambda at lambda=-0.5 🙂

But talking about the convenient lambda=0 in this example, isn't that the transformation suppose to be log(Y)=log(13.44)=1.128 ?it is not the output again though...

How is this transformation calculated then..?

What is the difference between choosing Best Lambda and the Convenient Lambda? Any reason of deciding to choose the convenient lambda when we know both the Best and the Convenient?

Thanks!

Use natural log (base e) not log base 10.

