turn on suggestions

Auto-suggest helps you quickly narrow down your search results by suggesting possible matches as you type.

Showing results for

Find a Community

- Home
- /
- Analytics
- /
- Data Mining
- /
- How to interpret log and sqrt transformed variable...

Topic Options

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

03-13-2012 09:31 AM

I have a log transformation on dependent variable and sqrt transformation on independent variable

log (y) = intercept + (estimate) * sqrt(x)

log(y) = b0 + b1 * sqrt(x)

How do I interpret b1 ? How the calculations should proceed such that we can say one unit change in "x" results in so and so %change in the expected value of the dependent variable while all the predictors are held constant?

Any help would be really appreciated. Thanks a lot

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

a week ago

What you are describing is the inherent difficulty in regression models. Since regression models typically are inflexible and require the model structure to reflect the population, you can easily be left with results which are difficult to interpret. The slope of marketing dollars to predict sales dollars makes sense to most people, but the slope of sqrt(marketing dollars) to predict log(dollars) has little to hold on to in terms of easy interpretation. To make things more difficult, the optimum values in the transformed space do not translate back to the optimum values in the non-transformed space.

You might consider a more flexible modeling strategy such as a Decision Tree if interpretation is critical. You might also consider clustering your observations and building a regression models against each cluster hoping that you can find a simpler model on the subset which alleviates some of the problem introduced by the whole data set.

You also might consider doing one of the following:

1 - Setting a threshold (e.g. Revenue > $100) and then build a Decision Tree to predict which combinations of values/levels of the original variables are associated with high vs. low revenue

2 - Vary the data systematically around a point of interest to observe how the predicted values change to gain some understanding of how the model is changing near that point.

Unfortunately, there is no magic bullet for easy interpretation in your example.

I hope this helps!

Doug