BookmarkSubscribeRSS Feed
JUMMY
Obsidian | Level 7

 

How do I fit a simple linear regression model using a transformation of the dependent variable in the data below? And which one is best when considering variance stabilization?

 

data one;
  input X @;
  do i= 1 to 4;
   input Y @;
   output;
  end;
  drop i;
datalines;
2.5  7.5 9.5 8.0 8.5
5.0  11.0 12.0 9.0 10.0
7.5  11.0 16.0 12.5 14.0
10.0  16.5 14.5 21.5 19.0
;
run;

ods graphics on; 
proc reg data=one plots=all;
  model Y=X / p r clm cli influence;
run;
ods graphics off;
8 REPLIES 8
PGStats
Opal | Level 21

Consider proc transreg to explore transformations. For a variance stabilizing transformation, check the Box-Cox transformation:

 

https://documentation.sas.com/?docsetId=statug&docsetTarget=statug_transreg_examples02.htm&docsetVer...

 

 

PG
ballardw
Super User

What kind of transform?

 

If you have an idea then create second variable in the data one. Suppose you want a transform of y squared. Then add the transform prior to the output in the data step such as:

data one;
  input X @;
  do i= 1 to 4;
   input Y @;
   y2 = y*y;
   output;
  end;
  drop i;

Then run the regression with y2 as the dependent. If you are looking for advice on appropriate transforms that data set is a tad small to provide really good suggestions.

JUMMY
Obsidian | Level 7
@ballardw, based on the earlier regression I run using SAS, the scatterplot showed a fan shape pattern. So I was thinking of using these transformations: Y'=1/Y or Y'=log(Y). I am not sure which one is ideal for this situation. What do you suggest?
ballardw
Super User

@JUMMY wrote:
@ballardw, based on the earlier regression I run using SAS, the scatterplot showed a fan shape pattern. So I was thinking of using these transformations: Y'=1/Y or Y'=log(Y). I am not sure which one is ideal for this situation. What do you suggest?

The plot I ran would make me suspect the might be following some sort of exponential function. So log(y) would be one of the candidate functions. You might consider Log10 or Log2 as well.

 

Another approach is transforms of X either with your original dependent or transformed dependent.

PGStats
Opal | Level 21

Consider:

 


proc transreg data=one;
model boxcox(y / lambda=-2 to 2 by 0.1) = identity(x);
run;

/* Based on Box-Cox plot, choose Lambda = -0.5, i.e. inverse square root 
   transformation */
data two;
set one;
yp = 1/sqrt(y);
run;

proc reg data=two plots=all;
  model Yp=X / p r clm cli influence;
run;

/* Check out the fitplot graph */

FitPlot4.png

PG
JUMMY
Obsidian | Level 7

@PGStats, would Y'=1/Y be another good transformation to use? Looking at the R-square, we both had the same value of 0.8452.

 FitPlot24.png

PGStats
Opal | Level 21

The Box-Cox Plot gives you the confidence interval for the optimal transformation exponent.

 

BoxCoxPlot.png

 

It includes -1.5, -1, -0.5, and zero (log).

PG
JUMMY
Obsidian | Level 7

@PGStats The Box-Cox Plot gives a better understanding. That I do agree with. But what if we stick to PROC REG in making conclusion. Would the 95% CI for the mean of Y when X=5.0 be ((9.8051,12.1199). 

old.png

 

The 95% CI for the transformed Y, Y’ is (0.2998,0.3213). Would that be correct?

 

new.png

 

sas-innovate-2024.png

Available on demand!

Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 8 replies
  • 1956 views
  • 4 likes
  • 3 in conversation