- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Hi,
I have a continuous outcome and multiple categorical/binary predictors (coded as dummy variables 0 and 1), so I used a multiple linear regression. I had to log transform the outcome variable to fit linear model assumptions.
How do I go about interpreting each predictor?
e.g.
Parameter Estimate
Intercept 3.64
A 1 0.48
A 0 0.000000000
TEAM 1 0.49
TEAM 0 0.00
error 1 0.55
error 0 0.00
For TP, it would be (e^0.49)-1*100% = 0.63. Would my interpretation be:
a) For every unit increase in A, there is a 63% increase in Y, while team is 0 and error is 0 (holding all other variables constant).
b) There is a 63% increase in Y for A 0 compared to A 1, while team is 0 and error is 0 (holding all other variables constant.)
Because this predictor is binary, I don't know if if "for every unit increase" is appropriate.
Would the interpretation be the same for the other predictors?
For TEAM, (e^0.49)-1*100% = 0.62.
a) For every unit increase in TEAM, there is a 62% increase in Y, while A is 0 and error is 0 (holding all other variables constant).
b) There is a 63% increase in Y for TEAM 0 compared to TEAM 1, while team is 0 and error is 0 (holding all other variables constant.)
How would the interpretation be if the categorical predictor had 3 levels (0, 1, 2)? Or if there was an interaction in the predictors TEAM*A?
Thank you for your help.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
It helps to start by showing the SAS code used with GLM. That will give us some chance of answering the question.
The behavior can change somewhat depending on if you actually created a slew of dummy variables or used the CLASS statement on GLM and how you did the "Log transform".
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
This is the code where the variables were created:
data want;
set have;
LOG_Y = log(Y);
*creating var A*;
if doc <= '01JUN2019:00:00:00'dt then A = 0;
else if doc => '01JUN2019:00:00:00'dt then A = 1;
*creating var B*;
if time >= '07:00:00't and time <= '18:59:59't then B = 0;
else B = 1;
*creating var TEAM*;
if team___1 = 1 AND team___3 = 1 then TEAM = 0;
else TEAM = 1;
*error: already existed from database, 0 = no, 1 = yes*;
rename com_group___5 = error;
*creating var age_group*;
if age < 730 then age_group = 0;
else if age >= 730 and age < 4380 then age_group = 1;
else if age >= 4380 and age <= 6570 then age_group = 2;
run;
Code for GLM, which gave the results previously posted :
proc glm data=want;
class A (ref="0") error (ref="0") TEAM (ref = "0");
model LOG_Y = A TEAM error / solution clparm;
run;
Code with 3 levels of category (age_group) and interaction:
proc glm data=want;
class A(ref="0") error(ref="0") B(ref = "0") age_group(ref = "0");
model LOG_Y = A error age_group A*B/ solution clparm;
run;
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Transforming the response variable (to achieve normality of errors?) isn't necessary for fitting a model and estimating an effect of a categorical variable.
It is necessary to perform hypothesis tests and creating confidence intervals.
Paige Miller
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
For the first model you showed (a main effects only model), the exponentiated parameter estimate for any predictor is an estimate of the ratio of Y means comparing the associated level to the predictor's reference level.
For example: (mean(Y) for TEAM 1)/(mean(Y) for TEAM 0) = exp(0.49) . Expressed as a percent change: (exp(0.49)-1)*100%
Similarly for A and ERROR. It is the same for a multilevel CLASS predictor - still a comparison of the level that the parameter estimate is associated with vs the reference level. For a continuous predictor, it is the ratio for a unit change in the predictor. Notice that it is a ratio, not a difference, of means since you log-transformed your response, presumably because you are assuming that Y is log-normally distributed.