turn on suggestions

Auto-suggest helps you quickly narrow down your search results by suggesting possible matches as you type.

Showing results for

Find a Community

- Home
- /
- Analytics
- /
- Stat Procs
- /
- How do I interpret the coefficient values and make...

Topic Options

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

05-13-2013 08:55 AM

Hi,

In a multiple regression model with cost data as dependent variable (Y), I have used proc transreg (model BoxCox) in SAS to get the proper Box-Cox transformation of Y (in order for the residuals to be normally distributed).

model BoxCox(Y) = identity(x1 x2 x3 x4 x5 x6)

The result was lambda= -0,25. So I transform my dependent with formula:

(((Y**(-0.25))-1) / (-0.25))

and run a proc reg, with the Box-Cox transformed dependent variable and my independent variables. I have read that the back-transformation (inverse) of Box-Cox is:

x = (lambda*z + 1)^(1/lambda),

where z is the transformed variable and lambda = -0,25 in my case.

How do I interpret the coefficients and standard errors from the proc reg?

Do I back-transform all the beta-coefficients?

For example, one of my significant variables has a beta= -0,01068 with standard errors= 0,00326.

How do I interpret that? Any feedback/comment much appreciated

Best regards,

Hank

Accepted Solutions

Solution

05-14-2013
11:32 AM

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

05-14-2013 11:32 AM

For each unit change in the x variable, the transformed Y variable decreases by -0.01068. Since this is a non-linear transform, you should plug in low, median and high values for Y to get some idea of how the Y variable decreases in response to changes in the X variable.

Steve Denham

All Replies

Solution

05-14-2013
11:32 AM

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

05-14-2013 11:32 AM

For each unit change in the x variable, the transformed Y variable decreases by -0.01068. Since this is a non-linear transform, you should plug in low, median and high values for Y to get some idea of how the Y variable decreases in response to changes in the X variable.

Steve Denham

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

05-15-2013 03:01 AM

Thanks for the help, much appreciated I have a similar question maybe you can help to clarify.

If the transformations are only in a subset of the independent variables, say half of the x-variables are transformed with the square root. How do I interpret that, in relation to the response variable? Shall I back-transform the beta-coefficients of the x-var. first and use these transformed values in relation to the response variable?

On to the final question. I found another post by you (https://communities.sas.com/message/125380#125380) where you wrote:

"If you are working on developing a predictive equation with only a single predictor, take a good look at PROC TRANSREG. This would enable you to model the dependent variable as a logit, and the independent variable in a variety of ways--class, optimal transforms, non-optimal transforms, nonlinear transforms (such as Box-Cox or penalized B-splines)."

What if I do different transformations on both the y- and the x-variables, as a rule of thumb (if there is any), how can I think when I am about to translate the result (beta-coeff. and s.e.) into original terms?

For example, if the response (y) are transformed via Box-Cox and the rest of the variables via the logit transformation, how would I go about to translate the results?

Best regards,

Hank

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

05-15-2013 08:27 AM

I really try not to think of the relationship on the original scale for both independent and dependent variables. The transformed data are the ones that show a relationship, and only if it is a linear transformation is the original scale meaningful for the coefficients. If someone has a more non-linear worldview, maybe they can visualize what the coefficient might mean after back-transforming both sides of the equation. To me, the only way to see this would be to plug in multiple values for the independent variable and see what happens.

Steve Denham

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

05-15-2013 09:18 AM

I agree with Steve. On the other hand,you are always free to use the chain rule if you want to slog through the computations.

If you've transformed Y -> F(Y) and X -> g(X) and found that F(Y)=alpha+beta*g(X), then take derivatives wrt X of both sides:

df/dy * dy/dx = beta * dg/dx

which means that

dy/dx = beta * (dg/dx) / (df/dy)

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

05-15-2013 10:32 AM

@ Rick and Steve: Thanks a lot for the feed-back.

As a general thought, would you consider using maybe proc nlin to estimate a regression model, instead of trying to fit data to proc reg by transformations?

Best regards,

/Hank

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

05-15-2013 01:21 PM

I wouldn't. Least squares regression has many nice properties, including being able to estimate many coefficients without worrying about convergence of some optimization algorithm. If it makes sense to do a linear analysis, I grab that opportunity.

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

05-15-2013 03:06 PM

Alright!

Thanks for your input.

Best regards,

Hank

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

05-16-2013 10:14 AM

I love non-linear regression, but I prefer to have a known process that might be generating the nonlinear response. Without knowing the process, and reflecting on what you have given here, I assume that you are in an exploratory mode. Using TRANSREG to identify significant relationships that can be linearized is bound to be more productive. It also opens up semi-parametric methods (splines) that are generally not used enough, in my opinion.

If at all possible, get a copy of Frank Harrell's *Regression Modeling Strategies* for some good approaches.

Steve Denham

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

05-16-2013 04:34 PM

Thanks once again for the input, and the book recommendation.

I read more about the proc transreg procedure, and an example here:

SAS/STAT(R) 9.2 User's Guide, Second Edition

In my data, I have cost data as the response (which in the litterature usually is log-transformed) and a variety of continuous and discrete explanatory variables. Some are dummy and others are values from 0-100, often with a concentration of values near the range of 70-100 or 0.

As I wrote in the initial question, I did a Box-Cox transformation of the response:

model BoxCox(Y) = identity(x1 x2 x3 x4 x5 x6)

This generated a model in which the residuals are normally distributed, but R2 is not as high as I think it could be, and my fear is that I could miss some relationships that are nonlinear in the explanatory variables.

Thanks to yours, and Ricks, excellent help, I`m now thinking of doing something like the example in the link above (as shown below):

-Namely using mspline in the response and spline on my predictors that range from 0-100.

But the nice interpretation breaks down, and to explain the coefficients in one way or the other (for myself as well) is harder. Showing changes in low/mid/high values of

the response from a change in (original value of x) is still a quite good way of explaining the relationship to non-professionals. But with different transformations on the independent variables (as shown below), now I find it really difficult to even say anything about which independent variable that explains the most, and the ratio of its affect on the response in relation to the other independent variables.

Do you have any idea on how to make sense of the different relationships when explained to non-statisticans?

Best regards,

Hank

******************************************************************** example from sas homepage ********************************************

* Fit the Nonparametric Model;

proc transreg data=Gas solve test nomiss plots=all;

ods exclude where=(_path_ ? 'MV');

model mspline(NOx / nknots=9) = spline(EqRatio / nknots=9)

monotone(CpRatio) opscore(Fuel);

run;

Intercept | 1 | -15.274649 | 57.1338 | 57.1338 | 1227.60 | <.0001 | Intercept |
---|---|---|---|---|---|---|---|

Pspline.EqRatio_1 | 1 | 35.102914 | 62.7478 | 62.7478 | 1348.22 | <.0001 | Equivalence Ratio (PHI) 1 |

Pspline.EqRatio_2 | 1 | -19.386468 | 64.6430 | 64.6430 | 1388.94 | <.0001 | Equivalence Ratio (PHI) 2 |

Identity(CpRatio) | 1 | 0.032058 | 1.4445 | 1.4445 | 31.04 | <.0001 | Compression Ratio (CR) |

Opscore(Fuel) | 5 | 0.158388 | 5.5619 | 1.1124 | 23.90 | <.0001 | Fuel |

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

05-17-2013 08:58 AM

Hmm. Time to back away from the splines, at least for now.

Let's go back to the original Box-Cox transformation. with lambda=-0.25. What does that imply as a transfomation? First, it is negative, so there is an inverse transformation, and second the absolute value is 0.25, which is taking a square root twice. Thus, I would expect that the original distribution of Y is such that there are a LOT of values near zero, with a sharp drop off as you move to the right, and that the distribution probably "stops" at some value. Is that anything close to correct?

Now, you say that you believe the Rsquared for your model is "not as high as you think it could be." There could be a couple of reasons for that. First, there may be more noise in your data than you thought. Second, you may be missing a key variable, or a key interaction between the variables you do have. This is where subject knowledge MUST be used in specifying your model.

Steve Denham

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

05-17-2013 09:25 AM

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

05-17-2013 09:30 AM

Hi again,

Thanks for your willingness to help, it means a lot.

Above, I have posted the histogram from my original response-data, and the result from the box cox-transformation. As you can see, the response looks near-normal but is positively skewed. The real mess in the data lies in the independent variables, as indicated from the box-cox-picture aboce with lots of almost flat curves/lines.

First, I tried to log-transform the response, but it failed all the normality-tests of the regression residuals. This transformation (the Box-Cox) are normal in residuals.

Due to the many almost flat curves in the Box-Cox picture, I was thinking that maybe spline-regression in the predictors would do the trick. What is your thoughts from the pictures above? Best regards, Hank

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

05-17-2013 09:50 AM

I begin to see why the Rsquared isn't what you had hoped. Those flat lines indicate that there doesn't seem to be much of a relationship between these variables and the (transformed) response. At this point, splines might be an approach, since it looks like a fishing expedition. There may be some linear combination of the predictors that has a relationship with the response. However, splines are generally fit within a predictor that is, well, "clumpy" (I'm sure that is a real statistical term in some universe). All I see are flat lines--like western Kansas flat lines:smileygrin:. Any "hill" at all will be the driver of a fit.

I have an idea, but am not real sure of a theoretical basis to do it. Suppose you Box-Cox transform your response variable, and then try PROC PLS and see if there are a limited number of "hidden components" based on the predictors. Plus you get a cross-validation fit.

Steve Denham

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

05-17-2013 12:06 PM

Hi,

I seems to have missed your latest post. I will look up proc pls asap tomorrow (its friday afternoon in this part of the world, and there is a world outside of econometric modelling, sometimes..). It seems like an interesting way to estimate the predictive partial least squares. Your illustrative example of the relationship between those flat lines and splines helped me a lot in understanding how both the models work (as well as it explains the high fit of my spline-model described above). Thanks

Best regards,

Hank