Hi there,
I ran the following code to generate the non-linear regression fit. Does anyone know how to get predicted Y based on predicted transformed Y?
ods graphics on;
ods output details=d;
proc transreg data=edaniat.fee_t solve details ss2 short nomiss plots=all;
ods output splinecoef=c;
model mspline(Y / nknots=9) = mspline(x1 / nknots=9)
mspline(x2/ nknots=9) spline(x3/ nknots=9) mspline(x4/ nknots=9);
output out=y PREDICTED;
run;
ods graphics off;
Any inputs are welcome!
Sorry. I wrote this code decades ago, and I don't hear much about how people use it. At the end of the processing, there is a model based on the transformation of Y and the transformations of the Xs. Transreg has knowledge of the degrees of freedom involved in the transformations and produces statistical tests based on them. It will plot transformations that use the original variables and do lots of other stuff. But, at the end of the iterations, in the statistical calculations, it knows nothing about Y. It only knows about TY.
So there is nothing in Transreg that does what you want. The only thing I can think of is you can look at the graph of the transformation of Y. If it looks like some functional form (say quadratic or square root), then you can try a new model that uses that transformation instead. If that works reasonably well, then you can develop a formula that maps values from Y to TY and back to Y. Then you could apply that formula to predicted values (as long as it is mathematically valid, e.g. no logs of negative numbers). I cannot speak to the statistical properties of such an approach.
Maybe there is a user out there who has done what you want who can help more than me.
Include the point in your original data, with a missing dependent variable.
Then use the PREDICTED option to get the predicted value.
This blog post covers the methods for scoring data:
https://blogs.sas.com/content/iml/2014/02/19/scoring-a-regression-model-in-sas.html
and specifically this one is the method I'm suggesting.
@NaNaN wrote:
Hi there,
I ran the following code to generate the non-linear regression fit. Does anyone know how to get predicted Y based on predicted transformed Y?
ods graphics on;
ods output details=d;
proc transreg data=edaniat.fee_t solve details ss2 short nomiss plots=all;
ods output splinecoef=c;
model mspline(Y / nknots=9) = mspline(x1 / nknots=9)
mspline(x2/ nknots=9) spline(x3/ nknots=9) mspline(x4/ nknots=9);
output out=y PREDICTED;
run;
ods graphics off;
Any inputs are welcome!
Thanks, Reeza!
Still not clear to me how to get predicted Y value from predicted transformed Y value. In my code, I output the result to dataset Y, which contains Ty, which is transformed Y value by mspline(Y). How do I get Y based on Ty?
There is no "code" option for proc transreg.
I wrote transreg, but I am not sure what you want. You can apply the coefficients from the model to the transformed Xs and you get predicted values for the transformed Y. You can transform the Xs and not transform Y and get predicted values for Y. You already know all that. The only other thing I can think of is you can fit a second regression model given the transformed Xs from the model that transforms the Xs and Y and then find the predicted values for Y from a regression model that uses the transformed Xs. Is that what you want? If so, why? If that is what you want, you can do it in two steps. If that is not what you want, then I need more explanation.
Thanks @WarrenKuhfeld, for your comments. Here is what I want:
In the TRANSREG, I transformed the target attribute Y by using mspline(Y / nknots=9). I dumped the output to a data set and over there I can find predicted transformed Y (PTY).
For example. if Y=10 and TY=mspline(Y)=30, the TRANSREG returns the predicted transformed Y PTY=35. I want to figure out the predicted value of Y. Also I believe the R square in TRANSREG is calculated on TY (transformed Y) and PY (predicted transformed Y)
Please let me know if I am not still clear enough.
Thanks!
-Shelly
Sorry. I wrote this code decades ago, and I don't hear much about how people use it. At the end of the processing, there is a model based on the transformation of Y and the transformations of the Xs. Transreg has knowledge of the degrees of freedom involved in the transformations and produces statistical tests based on them. It will plot transformations that use the original variables and do lots of other stuff. But, at the end of the iterations, in the statistical calculations, it knows nothing about Y. It only knows about TY.
So there is nothing in Transreg that does what you want. The only thing I can think of is you can look at the graph of the transformation of Y. If it looks like some functional form (say quadratic or square root), then you can try a new model that uses that transformation instead. If that works reasonably well, then you can develop a formula that maps values from Y to TY and back to Y. Then you could apply that formula to predicted values (as long as it is mathematically valid, e.g. no logs of negative numbers). I cannot speak to the statistical properties of such an approach.
Maybe there is a user out there who has done what you want who can help more than me.
Thanks for the kind words, Shelly!
A key component of an alternating least-squares algorithm like the one in Transreg is scaling. At every step, it needs to recscale each variable to maintain a constant mean and variance. Transreg does try to maximize R square. So if you use it to suggest alternative models involving logs, square roots, polynomials, and so on, the original Transreg R square will be higher (unless you deliberately construct something weird). Residuals depend on the scale of the variable. You need to ensure everything is on the same scale (mean and variance) before comparing residuals.
In a model like spline(y) = spline(x1) spline(x2), the transformation of each variable does in fact depend on every other variable, and in the end, Transreg fits a regression model using the three transformed variables. For many models (excluding monotone, untie, mspline, pbspline, and some others), Transreg can directly solve for a solution without iterating. That would be the case when all variables come from spline, opscore, class, linear, identity, etc. If you specify the SOLVE option, Transreg will fit a canonical correlation model using a B-spline basis for Y on one side and B-spline bases for X1 and X2 on the other. Then the canonical coefficients for the first canonical variable can be used to directly find the optimal transformations. I don't know if you find that helpful or not, but it is another way of saying that Transreg is trying to find linear combinations of basis functions that optimize R-square, and yes, depend on all the variables.
Best,
Warren
Join us for SAS Innovate 2025, our biggest and most exciting global event of the year, in Orlando, FL, from May 6-9.
Save $200 when you sign up by March 14!
ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.
Find more tutorials on the SAS Users YouTube channel.