BookmarkSubscribeRSS Feed
☑ This topic is solved. Need further help from the community? Please sign in and ask a new question.
Fluvio1
Calcite | Level 5

I have performed OLS regression on ln transformed x and y variables, and re-transformed the results (ln_pred) to the original units using Duan's Smearing Estimate (my own code). The latter is done to minimize re-transformation bias.   I am not sure how to re-transform the ln confidence limits (Ln_Upper and Ln_Lower 95%) back to the original units--do I apply the smearing estimate or simply exponentiate the ln confidence limit values ?

1 ACCEPTED SOLUTION

Accepted Solutions
SteveDenham
Jade | Level 19

I haven't done this, but if I had, I probably would have used PROC NLIN, NLMIXED, GENMOD or GLIMMIX. For manually transformed values in PROC REG or GLM, it is nice to know that there is a relatively simple unbiased back-transformation that doesn't appear to be as volatile as the expected value estimator for a log-normal distribution.

 

See this site: https://stat.ethz.ch/education/semesters/as2015/asr/Script_v151119.pdf The authors argue that since the confidence bounds are estimated percentiles of the distribution, the naive back-transformation (simple exponentiation) is certainly adequate and appropriate.

 

At least until the smeared estimate is outside of the confidence interval...

 

SteveDenham

View solution in original post

7 REPLIES 7
SteveDenham
Jade | Level 19

I haven't done this, but if I had, I probably would have used PROC NLIN, NLMIXED, GENMOD or GLIMMIX. For manually transformed values in PROC REG or GLM, it is nice to know that there is a relatively simple unbiased back-transformation that doesn't appear to be as volatile as the expected value estimator for a log-normal distribution.

 

See this site: https://stat.ethz.ch/education/semesters/as2015/asr/Script_v151119.pdf The authors argue that since the confidence bounds are estimated percentiles of the distribution, the naive back-transformation (simple exponentiation) is certainly adequate and appropriate.

 

At least until the smeared estimate is outside of the confidence interval...

 

SteveDenham

Fluvio1
Calcite | Level 5

Thank you Steve. However it does look  like some of the Duan's  smeared estimates are outside the confidence limits. If I compute the quantiles for the exponentiated confidence limits, they  are highly skewed:

 

Fluvio1_0-1690389621589.png

Not sure how to handle this ?

 

 

Rick_SAS
SAS Super FREQ

If (L, U) is a (1-alpha) CI for a parameter, p, then (eta(L), eta(U)) is a (1-alpha) CI for the parameter, eta(p), for any strictly monotone increasing continuous transformation, eta. That's because 

P(L <= X <= U) = P(eta(L) <= eta(X) <= eta(U))

for a continuous r.v X. So, yes, you can apply the inverse transformation to back-transform the estimates, including CIs.

SteveDenham
Jade | Level 19

Thanks, @Rick_SAS . That property was what I was thinking about when I said that if a smeared estimate is used for the expected value (= eta in your post), then it should also be applied to the confidence bounds. The Swiss Tech paper implies that a direct exponentiation should be adequate.  Thoughts?

 

SteveDenham 

Rick_SAS
SAS Super FREQ

I had never heard of Duan's smearing estimator until this post, and I have no experience with it. I quickly glanced at the paper you cited. It looks to me like they transformed X and Y variables by using LOG and then backtransformed the estimates by using inverse-LOG = EXP. Yes, that is valid and standard.

 

Maybe I am wrong, but even though the paper mentions Duan's work, I don't think they actually use it. Those paragraphs seem to be inserted (maybe at the suggestion of a referee) to let the reader know that there is an alternative, which they authors do not use.

 

I think it is important to point out that there is a difference between a generalized linear model with a log-link and a linear model on the log-transformed data. See Error distributions and exponential regression models - The DO Loop (sas.com)
which compares
GENMOD

model y = x / dist=normal link=log;

to GLM 

model logY = x;

The OP seems to be using the latter. Which model is correct depends on the error distribution, which is perhaps why the OP mentions the smearing estimate. I do not know the correct model for the OP's data. Graphing the distribution of the residuals might be prudent. Domain knowledge might also help.

Fluvio1
Calcite | Level 5

Thanks again Rick.  I attached the original paper by Duan that defines the smearing estimator. Also see below:

Fluvio1_0-1690390001666.png

 

Fluvio1
Calcite | Level 5

Thank you Rick for the insightful explanation.

SAS Innovate 2025: Register Now

Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!

What is ANOVA?

ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 7 replies
  • 2064 views
  • 2 likes
  • 3 in conversation