Solved: Get value x given mspline(x) value from transreg

NaNaN · Posted 02-05-2018 02:16 PM

Hi there,

I ran the following code to generate the non-linear regression fit. Does anyone know how to get predicted Y based on predicted transformed Y?

ods graphics on;
ods output details=d;
proc transreg data=edaniat.fee_t solve details ss2 short nomiss plots=all;
ods output splinecoef=c;
model mspline(Y / nknots=9) = mspline(x1 / nknots=9)
mspline(x2/ nknots=9) spline(x3/ nknots=9) mspline(x4/ nknots=9);
output out=y PREDICTED;
run;
ods graphics off;

Any inputs are welcome!

WarrenKuhfeld · Posted 02-06-2018 01:58 PM

Sorry. I wrote this code decades ago, and I don't hear much about how people use it. At the end of the processing, there is a model based on the transformation of Y and the transformations of the Xs. Transreg has knowledge of the degrees of freedom involved in the transformations and produces statistical tests based on them. It will plot transformations that use the original variables and do lots of other stuff. But, at the end of the iterations, in the statistical calculations, it knows nothing about Y. It only knows about TY.

So there is nothing in Transreg that does what you want. The only thing I can think of is you can look at the graph of the transformation of Y. If it looks like some functional form (say quadratic or square root), then you can try a new model that uses that transformation instead. If that works reasonably well, then you can develop a formula that maps values from Y to TY and back to Y. Then you could apply that formula to predicted values (as long as it is mathematically valid, e.g. no logs of negative numbers). I cannot speak to the statistical properties of such an approach.

Maybe there is a user out there who has done what you want who can help more than me.

View solution in original post

Reeza · Posted 02-05-2018 02:23 PM

Include the point in your original data, with a missing dependent variable.

Then use the PREDICTED option to get the predicted value.

http://documentation.sas.com/?docsetId=statug&docsetVersion=14.3&docsetTarget=statug_transreg_syntax...

This blog post covers the methods for scoring data:

https://blogs.sas.com/content/iml/2014/02/19/scoring-a-regression-model-in-sas.html

and specifically this one is the method I'm suggesting.

https://blogs.sas.com/content/iml/2014/02/17/the-missing-value-trick-for-scoring-a-regression-model....

@NaNaN wrote:

Hi there,

I ran the following code to generate the non-linear regression fit. Does anyone know how to get predicted Y based on predicted transformed Y?

ods graphics on;
ods output details=d;
proc transreg data=edaniat.fee_t solve details ss2 short nomiss plots=all;
ods output splinecoef=c;
model mspline(Y / nknots=9) = mspline(x1 / nknots=9)
mspline(x2/ nknots=9) spline(x3/ nknots=9) mspline(x4/ nknots=9);
output out=y PREDICTED;
run;
ods graphics off;

Any inputs are welcome!

NaNaN · Posted 02-05-2018 03:04 PM

Thanks, Reeza!

Still not clear to me how to get predicted Y value from predicted transformed Y value. In my code, I output the result to dataset Y, which contains Ty, which is transformed Y value by mspline(Y). How do I get Y based on Ty?

There is no "code" option for proc transreg.

WarrenKuhfeld · Posted 02-06-2018 09:28 AM

I wrote transreg, but I am not sure what you want. You can apply the coefficients from the model to the transformed Xs and you get predicted values for the transformed Y. You can transform the Xs and not transform Y and get predicted values for Y. You already know all that. The only other thing I can think of is you can fit a second regression model given the transformed Xs from the model that transforms the Xs and Y and then find the predicted values for Y from a regression model that uses the transformed Xs. Is that what you want? If so, why? If that is what you want, you can do it in two steps. If that is not what you want, then I need more explanation.

NaNaN · Posted 02-06-2018 12:02 PM

Thanks @WarrenKuhfeld, for your comments. Here is what I want:

In the TRANSREG, I transformed the target attribute Y by using mspline(Y / nknots=9). I dumped the output to a data set and over there I can find predicted transformed Y (PTY).

For example. if Y=10 and TY=mspline(Y)=30, the TRANSREG returns the predicted transformed Y PTY=35. I want to figure out the predicted value of Y. Also I believe the R square in TRANSREG is calculated on TY (transformed Y) and PY (predicted transformed Y)

Please let me know if I am not still clear enough.

Thanks!

-Shelly

WarrenKuhfeld · Posted 02-06-2018 01:58 PM

Sorry. I wrote this code decades ago, and I don't hear much about how people use it. At the end of the processing, there is a model based on the transformation of Y and the transformations of the Xs. Transreg has knowledge of the degrees of freedom involved in the transformations and produces statistical tests based on them. It will plot transformations that use the original variables and do lots of other stuff. But, at the end of the iterations, in the statistical calculations, it knows nothing about Y. It only knows about TY.

So there is nothing in Transreg that does what you want. The only thing I can think of is you can look at the graph of the transformation of Y. If it looks like some functional form (say quadratic or square root), then you can try a new model that uses that transformation instead. If that works reasonably well, then you can develop a formula that maps values from Y to TY and back to Y. Then you could apply that formula to predicted values (as long as it is mathematically valid, e.g. no logs of negative numbers). I cannot speak to the statistical properties of such an approach.

Maybe there is a user out there who has done what you want who can help more than me.

NaNaN · Posted 02-08-2018 08:57 AM

It was a great pleasure to have this discussion with the author of transreg! Thanks @WarrenKuhfeld!

Thanks for confirming that I cannot get Y from PTY. Also what you recommended is what I am doing now. However, it triggers more questions:
1. I did a manual piece wise linear regression line between PTY and Y and get predicted Y. Then I compared with the results with results generated from the regression without spline transformation in terms of average residual. What I found, very surprisingly, transreg prodcued much higher R square on transformed predicted value, but also much much higher average residual on untransforned predicted value. What does this tell you?

2. I try to understand how transreg works on a high level. Suppose now my transreg looks like is spline(y) = spline(x1) spline(x1). Translated to words, does it say that transreg will do transformation on X1 X2 and Y and then do linear regression on the three transformed value? If that is the case, does spline(x1) transform X1 based on Y? How about spline(Y)?

Sorry for all my questions! Just want to get a more clear picture of Transreg and want to use it correctly.

-Shelly

WarrenKuhfeld · Posted 02-08-2018 09:38 AM

Thanks for the kind words, Shelly!

A key component of an alternating least-squares algorithm like the one in Transreg is scaling. At every step, it needs to recscale each variable to maintain a constant mean and variance. Transreg does try to maximize R square. So if you use it to suggest alternative models involving logs, square roots, polynomials, and so on, the original Transreg R square will be higher (unless you deliberately construct something weird). Residuals depend on the scale of the variable. You need to ensure everything is on the same scale (mean and variance) before comparing residuals.

In a model like spline(y) = spline(x1) spline(x2), the transformation of each variable does in fact depend on every other variable, and in the end, Transreg fits a regression model using the three transformed variables. For many models (excluding monotone, untie, mspline, pbspline, and some others), Transreg can directly solve for a solution without iterating. That would be the case when all variables come from spline, opscore, class, linear, identity, etc. If you specify the SOLVE option, Transreg will fit a canonical correlation model using a B-spline basis for Y on one side and B-spline bases for X1 and X2 on the other. Then the canonical coefficients for the first canonical variable can be used to directly find the optimal transformations. I don't know if you find that helpful or not, but it is another way of saying that Transreg is trying to find linear combinations of basis functions that optimize R-square, and yes, depend on all the variables.

Best,

Warren

Get value x given mspline(x) value from transreg

Re: Get value x given mspline(x) value from transreg

Re: Get value x given mspline(x) value from transreg

Re: Get value x given mspline(x) value from transreg

Re: Get value x given mspline(x) value from transreg

Re: Get value x given mspline(x) value from transreg

Re: Get value x given mspline(x) value from transreg

Re: Get value x given mspline(x) value from transreg

Re: Get value x given mspline(x) value from transreg

Register Today!