Calcite | Level 5

## sas code for getting the R^2 transformation linear model

Hi,

I have a question about how to get the R^2 by transformed linear model? This R^2 should be under the base model and original scale. the model is log(y)=b0+bi*x+e. The R^2 needed is the model transformed back to the y=exp(b0+b1*x). I am confused how to get the R^2 by the model y=exp(b0+b1*x+e)? Is there anyone can help to figure out the code? Thanks!

2 REPLIES 2
PROC Star

## Re: sas code for getting the R^2 transformation linear model

If you are using a linear regression to fit the transformed model log(y)=b0+bi*x, with bo and bi optimized to maximize R-square, then what do you actually mean by r-square for the "original" model Y=exp(b0+bi*x)?   In the estimated model, R-square is a proportional-reduction-in-error where model error is the sum of squared(actual-estimate)  = sum of squared(log(y)-estimate(log(y))) but I don't see how one could transform that to get a similarly-defined proportional-reduction-in-error in the original scale.

I guess you could get a correlation of the observed Y and the estimated exp(b0+bi*x), and I imagine there is a way to get an R2 from that.  I just don't know what it would mean.

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------

## Re: sas code for getting the R^2 transformation linear model

Hi @lei2004 and welcome to the SAS Support Communities!

I agree that the formula for R1² in the article is a bit confusing, but (without having access to the original source by Kvålseth) I think you just need to insert the yi, their mean and the back-transformed predicted values (EDIT: that is: exp("(log yi) hat")).

Taking the first of the two numeric examples from section 3 of the article:

``````data have;
input x y;
log_y=log(y);
cards;
0 .5
1 4
2 6
3 7
16 12
20 22
;

proc summary data=have;
var y;
output out=stats css=css;
run;

proc reg data=have;
model log_y=x;
output out=pred p=log_y_hat;
quit;

data want(keep=sse css rsq);
if _n_=1 then set stats;
set pred end=last;
sse+(y-exp(log_y_hat))**2;
if last;
rsq=1-sse/css;
run;``````

Result:

```  css        sse        rsq

287.208    34.9594    0.87828```

(matching the authors' result 0.88)

For the second example I got rsq=-0.31642, again matching the (corrected) value in the article (up to a minor rounding issue).

Equivalently, you could obtain CSS from PROC REG (see ODS table ANOVA for model y=x, which could be included in the existing PROC REG step as model log_y y=x) instead of PROC SUMMARY.

Discussion stats
• 2 replies
• 373 views
• 1 like
• 3 in conversation