How do I fit a simple linear regression model using a transformation of the dependent variable in the data below? And which one is best when considering variance stabilization?
data one;
input X @;
do i= 1 to 4;
input Y @;
output;
end;
drop i;
datalines;
2.5 7.5 9.5 8.0 8.5
5.0 11.0 12.0 9.0 10.0
7.5 11.0 16.0 12.5 14.0
10.0 16.5 14.5 21.5 19.0
;
run;
ods graphics on;
proc reg data=one plots=all;
model Y=X / p r clm cli influence;
run;
ods graphics off;
Consider proc transreg to explore transformations. For a variance stabilizing transformation, check the Box-Cox transformation:
What kind of transform?
If you have an idea then create second variable in the data one. Suppose you want a transform of y squared. Then add the transform prior to the output in the data step such as:
data one; input X @; do i= 1 to 4; input Y @; y2 = y*y; output; end; drop i;
Then run the regression with y2 as the dependent. If you are looking for advice on appropriate transforms that data set is a tad small to provide really good suggestions.
@JUMMY wrote:
@ballardw, based on the earlier regression I run using SAS, the scatterplot showed a fan shape pattern. So I was thinking of using these transformations: Y'=1/Y or Y'=log(Y). I am not sure which one is ideal for this situation. What do you suggest?
The plot I ran would make me suspect the might be following some sort of exponential function. So log(y) would be one of the candidate functions. You might consider Log10 or Log2 as well.
Another approach is transforms of X either with your original dependent or transformed dependent.
Consider:
proc transreg data=one;
model boxcox(y / lambda=-2 to 2 by 0.1) = identity(x);
run;
/* Based on Box-Cox plot, choose Lambda = -0.5, i.e. inverse square root
transformation */
data two;
set one;
yp = 1/sqrt(y);
run;
proc reg data=two plots=all;
model Yp=X / p r clm cli influence;
run;
/* Check out the fitplot graph */
@PGStats, would Y'=1/Y be another good transformation to use? Looking at the R-square, we both had the same value of 0.8452.
The Box-Cox Plot gives you the confidence interval for the optimal transformation exponent.
It includes -1.5, -1, -0.5, and zero (log).
@PGStats The Box-Cox Plot gives a better understanding. That I do agree with. But what if we stick to PROC REG in making conclusion. Would the 95% CI for the mean of Y when X=5.0 be ((9.8051,12.1199).
The 95% CI for the transformed Y, Y’ is (0.2998,0.3213). Would that be correct?
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.