I conducted a BOX-COX analysis using the following code:
ods rtf file='/folders/myfolders/regr4BCoxNresuls/CLnrGFR1.rtf' style=journal;
ods TRACE ON;
ods output FitStatistics=FitStat ParameterEstimates=param;
proc transreg data=boot2 plots=all; /*plots(only)=FITPLOT(stats=none); */
model boxcox(CLnr/convenient lambda=-3 to 3 by 0.125)=identity(GFR1)/cl; /*SPEC;*/
output out=pred ;
proc print data=fitstat;run;
proc print data=param;run;
proc print data=pred;run;
run;
quit;
ods rtf close;
The best lamda was =0 which means that the log transformation for CLnr was best. The data is posted below (Nstudyr4.csv)set which if I use log transformation for CLnr would give negative values. On line it was suggested that I can do log transformation if I add a constant to each value. My question is how do I interpret the value for the intercept if I do a regression when I take the antilog with the added constant?
I have attached the data set and the BOX-Cox output.
Subject | CL | CLr | CLnr | GFR1 | GFR2 |
1 | 2.02 | 1.73 | 0.29 | 125.5 | 124.1 |
2 | 2.18 | 1.63 | 0.55 | 114.3 | 113.9 |
3 | 1.95 | 1.28 | 0.67 | 102.2 | 102 |
4 | 1.83 | 1.16 | 0.67 | 101.6 | 100.7 |
5 | 1.64 | 1.47 | 0.17 | 96.6 | 95.6 |
6 | 1.94 | 1.72 | 0.22 | 90.7 | 89.2 |
7 | 1.98 | 1.47 | 0.51 | 87.8 | 86.2 |
8 | 1.67 | 1.07 | 0.6 | 84.5 | 83.6 |
9 | 1.25 | 1.19 | 0.06 | 84.2 | 83.4 |
10 | 0.9 | 0.74 | 0.16 | 77.7 | 76.5 |
11 | 1.53 | 1.06 | 0.47 | 75.6 | 75.2 |
12 | 1.52 | 0.82 | 0.7 | 68 | 67.5 |
13 | 1.58 | 1.23 | 0.35 | 66.5 | 65.3 |
14 | 0.93 | 0.75 | 0.18 | 64.4 | 64.2 |
15 | 0.93 | 0.95 | . | 60 | 59.6 |
17 | 0.61 | 0.48 | 1.2 | 52.3 | 52.5 |
18 | 0.76 | 0.46 | 0.15 | 52 | 52.3 |
19 | 0.57 | 0.44 | 0.32 | 51.7 | 52.3 |
20 | 0.66 | 0.4 | 0.17 | 49.3 | 49.7 |
21 | 0.51 | 0.15 | 0.51 | 37.5 | 37.7 |
22 | 0.41 | 0.32 | 0.19 | 35.5 | 35.7 |
23 | 0.65 | 0.57 | . | 34.3 | 34.4 |
24 | 0.9 | 0.58 | 0.07 | 31 | 31.4 |
25 | 0.5 | 0.29 | 0.61 | 28.1 | 29.3 |
26 | 0.31 | 0.22 | 0.28 | 26.6 | 26.8 |
27 | 0.3 | 0.21 | 0.1 | 24.8 | 24.7 |
28 | 0.48 | 0.38 | . | 18 | 18.6 |
29 | 0.42 | 0.19 | 0.29 | 16.3 | 16.6 |
30 | 0.31 | 0.15 | 0.27 | 16 | 16.6 |
31 | 0.2 | 0.12 | 0.19 | 12.7 | 12.7 |
32 | 0.18 | 0.2 | . | 12.7 | 12.7 |
33 | 0.25 | 0.18 | . | 12.7 | 12.7 |
34 | 0.35 | 0.13 | 0.12 | 10.3 | 12.7 |
35 | 0.2 | 0.12 | 0.23 | 9.8 | 12.7 |
36 | 0.15 | 0.08 | 0.12 | 8 | 11.5 |
37 | 0.15 | 0.21 | . | 7.7 | 10.2 |
38 | 0.11 | 0.092 | 0.058 | 6.5 | 10.2 |
39 | 0.046 | 0.092 | 0.018 | 4.7 | 8.9 |
40 | 0.076 | 0.061 | . | 4.1 | 6.4 |
41 | 0.14 | 0.034 | 0.042 | 4.1 | 3.8 |
42 | 0.16 | 0.018 | 0.122 | 4.1 | 3.8 |
43 | 0.18 | 0.015 | 0.145 | 1.8 | 1.5 |
44 | 0.15 | 0.018 | 0.162 | 1.5 | 1.5 |
So what if ln(CLnr) is negative? There is no requirement that a dependent variable be positive. If you are truly worried, rescale the dependent variable--say by multiplying by 1000. Then the ln values will also be positive. Since I suppose that CLnr is clearance of some metabolite, it is like shifting the measurement from millimolar to micromolar.
Steve Denham
So what if ln(CLnr) is negative? There is no requirement that a dependent variable be positive. If you are truly worried, rescale the dependent variable--say by multiplying by 1000. Then the ln values will also be positive. Since I suppose that CLnr is clearance of some metabolite, it is like shifting the measurement from millimolar to micromolar.
Steve Denham
Thanks for the response and it addresses my issue.
Because your response data are positive, the log transformed response model is
log(Y) = a + b*x
or
Y = exp(a + b*x) where a is the estimate for the intercept and b is the estimate for the explanatory coefficient.
If you define A=exp(a), you get Y = A*exp(b*x)
So the intercept term multiplies the model: a unit change in the intercept estimate results in a mulitplicative change (a factor of e) in the predicted response.
You can do the same computation for other models. If you add c to the response before you fit the model, then
Y = -c + A*exp(b*x)
Thanks for the response.
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.
Find more tutorials on the SAS Users YouTube channel.