BookmarkSubscribeRSS Feed
epibio2021
Obsidian | Level 7

Hi, 

I have a continuous outcome and multiple categorical/binary predictors (coded as dummy variables 0 and 1), so I used a multiple linear regression. I had to log transform the outcome variable to fit linear model assumptions. 

 

How do I go about interpreting each predictor? 

e.g. 

Parameter Estimate

Intercept    3.64

A 1             0.48

A 0             0.000000000

TEAM 1     0.49

TEAM 0     0.00

error 1        0.55

error 0        0.00

 
 

For TP, it would be (e^0.49)-1*100% = 0.63. Would my interpretation be:

a) For every unit increase in A, there is a 63% increase in Y, while team is 0 and error is 0 (holding all other variables constant). 

b) There is a 63% increase in Y for A 0 compared to A 1, while team is 0 and error is 0 (holding all other variables constant.) 

Because this predictor is binary, I don't know if if "for every unit increase" is appropriate. 

 

Would the interpretation be the same for the other predictors? 

For TEAM, (e^0.49)-1*100% = 0.62. 

a) For every unit increase in TEAM, there is a 62% increase in Y, while A is 0 and error is 0 (holding all other variables constant). 

b) There is a 63% increase in Y for TEAM 0 compared to TEAM 1, while team is 0 and error is 0 (holding all other variables constant.) 

 

How would the interpretation be if the categorical predictor had 3 levels (0, 1, 2)? Or if there was an interaction in the predictors TEAM*A? 

 

Thank you for your help. 

5 REPLIES 5
ballardw
Super User

It helps to start by showing the SAS code used with GLM. That will give us some chance of answering the question.

 

The behavior can change somewhat depending on if you actually created a slew of dummy variables or used the CLASS statement on GLM and how you did the "Log transform".

epibio2021
Obsidian | Level 7

This is the code where the variables were created:

data want;
set have;

LOG_Y = log(Y);

*creating var A*;
if doc <= '01JUN2019:00:00:00'dt then A = 0;
else if doc => '01JUN2019:00:00:00'dt then A = 1;

*creating var B*; 

if time >= '07:00:00't and time <= '18:59:59't then B = 0;
else B = 1;

*creating var TEAM*; 

if team___1 = 1 AND team___3 = 1 then TEAM = 0;
else TEAM = 1;
*error: already existed from database, 0 = no, 1 = yes*; 

rename com_group___5 = error;

*creating var age_group*; 

if age < 730 then age_group = 0;
else if age >= 730 and age < 4380 then age_group = 1;
else if age >= 4380 and age <= 6570 then age_group = 2;  

run; 

 

Code for GLM, which gave the results previously posted :

proc glm data=want;
class A (ref="0") error (ref="0") TEAM (ref = "0");
model LOG_Y = A TEAM error / solution clparm;
run; 

 

Code with 3 levels of category (age_group) and interaction: 

proc glm data=want;
class A(ref="0") error(ref="0") B(ref = "0") age_group(ref = "0");
model LOG_Y = A error age_group A*B/ solution clparm;
run;

 

PaigeMiller
Diamond | Level 26

Transforming the response variable (to achieve normality of errors?) isn't necessary for fitting a model and estimating an effect of a categorical variable.

 

It is necessary to perform hypothesis tests and creating confidence intervals.

--
Paige Miller
epibio2021
Obsidian | Level 7
I have conducted t-tests for all my categorical predictors (which were originally continuous variables) on the outcome. After finding significant predictors for my outcome, I need a model that accounts for my predictors to find the association between the predictors and the outcome. What would you suggest other than a multiple linear regression?
StatDave
SAS Super FREQ

For the first model you showed (a main effects only model), the exponentiated parameter estimate for any predictor is an estimate of the ratio of Y means comparing the associated level to the predictor's reference level.

 

For example: (mean(Y)  for TEAM 1)/(mean(Y) for TEAM 0) = exp(0.49) .  Expressed as a percent change: (exp(0.49)-1)*100%

 

Similarly for A and ERROR. It is the same for a multilevel CLASS predictor - still a comparison of the level that the parameter estimate is associated with vs the reference level. For a continuous predictor, it is the ratio for a unit change in the predictor. Notice that it is a ratio, not a difference, of means since you log-transformed your response, presumably because you are assuming that Y is log-normally distributed.

 

hackathon24-white-horiz.png

The 2025 SAS Hackathon has begun!

It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.

Latest Updates

What is ANOVA?

ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 5 replies
  • 2910 views
  • 1 like
  • 4 in conversation