BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
NaNaN
Calcite | Level 5

Hi there,

 

I ran the following code to generate the non-linear regression fit. Does anyone know how to get predicted Y based on predicted transformed Y?

 

ods graphics on;
ods output details=d;
proc transreg data=edaniat.fee_t solve details ss2 short nomiss plots=all;
ods output splinecoef=c;
model mspline(Y / nknots=9) = mspline(x1 / nknots=9)
mspline(x2/ nknots=9) spline(x3/ nknots=9) mspline(x4/ nknots=9);
output out=y PREDICTED;
run;
ods graphics off;

 

Any inputs are welcome! 

1 ACCEPTED SOLUTION

Accepted Solutions
WarrenKuhfeld
Rhodochrosite | Level 12

Sorry.  I wrote this code decades ago, and I don't hear much about how people use it.  At the end of the processing, there is a model based on the transformation of Y and the transformations of the Xs.  Transreg has knowledge of the degrees of freedom involved in the transformations and produces statistical tests based on them.  It will plot transformations that use the original variables and do lots of other stuff.  But, at the end of the iterations, in the statistical calculations, it knows nothing about Y.  It only knows about TY.


So there is nothing in Transreg that does what you want.  The only thing I can think of is you can look at the graph of the transformation of Y.  If it looks like some functional form (say quadratic or square root), then you can try a new model that uses that transformation instead.  If that works reasonably well, then you can develop a formula that maps values from Y to TY and back to Y.  Then you could apply that formula to predicted values (as long as it is mathematically valid, e.g. no logs of negative numbers).  I cannot speak to the statistical properties of such an approach.

 

Maybe there is a user out there who has done what you want who can help more than me.

View solution in original post

7 REPLIES 7
Reeza
Super User

Include the point in your original data, with a missing dependent variable. 

 

Then use the PREDICTED option to get the predicted value. 

http://documentation.sas.com/?docsetId=statug&docsetVersion=14.3&docsetTarget=statug_transreg_syntax...

 

This blog post covers the methods for scoring data:

https://blogs.sas.com/content/iml/2014/02/19/scoring-a-regression-model-in-sas.html

 

and specifically this one is the method I'm suggesting.

https://blogs.sas.com/content/iml/2014/02/17/the-missing-value-trick-for-scoring-a-regression-model....

 


@NaNaN wrote:

Hi there,

 

I ran the following code to generate the non-linear regression fit. Does anyone know how to get predicted Y based on predicted transformed Y?

 

ods graphics on;
ods output details=d;
proc transreg data=edaniat.fee_t solve details ss2 short nomiss plots=all;
ods output splinecoef=c;
model mspline(Y / nknots=9) = mspline(x1 / nknots=9)
mspline(x2/ nknots=9) spline(x3/ nknots=9) mspline(x4/ nknots=9);
output out=y PREDICTED;
run;
ods graphics off;

 

Any inputs are welcome! 


 

NaNaN
Calcite | Level 5

Thanks, Reeza! 

 

Still not clear to me how to get predicted Y value from predicted transformed Y value. In my code, I output the result to dataset Y, which contains Ty, which is transformed Y value by mspline(Y). How do I get Y based on Ty? 

 

There is no "code" option for proc transreg.   

WarrenKuhfeld
Rhodochrosite | Level 12

I wrote transreg, but I am not sure what you want.  You can apply the coefficients from the model to the transformed Xs and you get predicted values for the transformed Y.  You can transform the Xs and not transform Y and get predicted values for Y. You already know all that. The only other thing I can think of is you can fit a second regression model given the transformed Xs from the model that transforms the Xs and Y and then find the predicted values for Y from a regression model that uses the transformed Xs.  Is that what you want?  If so, why? If that is what you want, you can do it in two steps.  If that is not what you want, then I need more explanation.

NaNaN
Calcite | Level 5

Thanks @WarrenKuhfeld, for your comments. Here is what I want:

 

In the TRANSREG, I transformed the target attribute Y by using mspline(Y / nknots=9). I dumped the output to a data set and over there I can find predicted transformed Y (PTY).

 

For example. if Y=10 and TY=mspline(Y)=30, the TRANSREG returns the predicted transformed Y PTY=35. I want to figure out the predicted value of Y. Also I believe the R square in TRANSREG is calculated on TY (transformed Y) and PY (predicted transformed Y)

 

Please let me know if I am not still clear enough. 

 

Thanks!

 

-Shelly

WarrenKuhfeld
Rhodochrosite | Level 12

Sorry.  I wrote this code decades ago, and I don't hear much about how people use it.  At the end of the processing, there is a model based on the transformation of Y and the transformations of the Xs.  Transreg has knowledge of the degrees of freedom involved in the transformations and produces statistical tests based on them.  It will plot transformations that use the original variables and do lots of other stuff.  But, at the end of the iterations, in the statistical calculations, it knows nothing about Y.  It only knows about TY.


So there is nothing in Transreg that does what you want.  The only thing I can think of is you can look at the graph of the transformation of Y.  If it looks like some functional form (say quadratic or square root), then you can try a new model that uses that transformation instead.  If that works reasonably well, then you can develop a formula that maps values from Y to TY and back to Y.  Then you could apply that formula to predicted values (as long as it is mathematically valid, e.g. no logs of negative numbers).  I cannot speak to the statistical properties of such an approach.

 

Maybe there is a user out there who has done what you want who can help more than me.

NaNaN
Calcite | Level 5
It was a great pleasure to have this discussion with the author of transreg! Thanks @WarrenKuhfeld!

Thanks for confirming that I cannot get Y from PTY. Also what you recommended is what I am doing now. However, it triggers more questions:
1. I did a manual piece wise linear regression line between PTY and Y and get predicted Y. Then I compared with the results with results generated from the regression without spline transformation in terms of average residual. What I found, very surprisingly, transreg prodcued much higher R square on transformed predicted value, but also much much higher average residual on untransforned predicted value. What does this tell you?

2. I try to understand how transreg works on a high level. Suppose now my transreg looks like is spline(y) = spline(x1) spline(x1). Translated to words, does it say that transreg will do transformation on X1 X2 and Y and then do linear regression on the three transformed value? If that is the case, does spline(x1) transform X1 based on Y? How about spline(Y)?

Sorry for all my questions! Just want to get a more clear picture of Transreg and want to use it correctly.

-Shelly
WarrenKuhfeld
Rhodochrosite | Level 12

Thanks for the kind words, Shelly!

 

A key component of an alternating least-squares algorithm like the one in Transreg is scaling.  At every step, it needs to recscale each variable to maintain a constant mean and variance.  Transreg does try to maximize R square. So if you use it to suggest alternative models involving logs, square roots, polynomials, and so on, the original Transreg R square will be higher (unless you deliberately construct something weird).  Residuals depend on the scale of the variable.  You need to ensure everything is on the same scale (mean and variance) before comparing residuals.

 

In a model like spline(y) = spline(x1) spline(x2), the transformation of each variable does in fact depend on every other variable, and in the end, Transreg fits a regression model using the three transformed variables.  For many models (excluding monotone, untie, mspline, pbspline, and some others), Transreg can directly solve for a solution without iterating.  That would be the case when all variables come from spline, opscore, class, linear, identity, etc.  If you specify the SOLVE option, Transreg will fit a canonical correlation model using a B-spline basis for Y on one side and B-spline bases for X1 and X2 on the other.  Then the canonical coefficients for the first canonical variable can be used to directly find the optimal transformations.  I don't know if you find that helpful or not, but it is another way of saying that Transreg is trying to find linear combinations of basis functions that optimize R-square, and yes, depend on all the variables.


Best,

Warren

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

What is ANOVA?

ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 7 replies
  • 1758 views
  • 4 likes
  • 3 in conversation