Statistical Procedures

epibio2021 · Posted 05-19-2021 12:03 AM

Hi,

I have a continuous outcome and multiple categorical/binary predictors (coded as dummy variables 0 and 1), so I used a multiple linear regression. I had to log transform the outcome variable to fit linear model assumptions.

How do I go about interpreting each predictor?

e.g.

Parameter Estimate

Intercept 3.64

A 1 0.48

A 0 0.000000000

TEAM 1 0.49

TEAM 0 0.00

error 1 0.55

error 0 0.00

For TP, it would be (e^0.49)-1*100% = 0.63. Would my interpretation be:

a) For every unit increase in A, there is a 63% increase in Y, while team is 0 and error is 0 (holding all other variables constant).

b) There is a 63% increase in Y for A 0 compared to A 1, while team is 0 and error is 0 (holding all other variables constant.)

Because this predictor is binary, I don't know if if "for every unit increase" is appropriate.

Would the interpretation be the same for the other predictors?

For TEAM, (e^0.49)-1*100% = 0.62.

a) For every unit increase in TEAM, there is a 62% increase in Y, while A is 0 and error is 0 (holding all other variables constant).

b) There is a 63% increase in Y for TEAM 0 compared to TEAM 1, while team is 0 and error is 0 (holding all other variables constant.)

How would the interpretation be if the categorical predictor had 3 levels (0, 1, 2)? Or if there was an interaction in the predictors TEAM*A?

Thank you for your help.

ballardw · Posted 05-19-2021 11:06 AM

It helps to start by showing the SAS code used with GLM. That will give us some chance of answering the question.

The behavior can change somewhat depending on if you actually created a slew of dummy variables or used the CLASS statement on GLM and how you did the "Log transform".

epibio2021 · Posted 05-19-2021 11:28 AM

This is the code where the variables were created:

data want;
set have;

LOG_Y = log(Y);

*creating var A*;
if doc <= '01JUN2019:00:00:00'dt then A = 0;
else if doc => '01JUN2019:00:00:00'dt then A = 1;

*creating var B*;

if time >= '07:00:00't and time <= '18:59:59't then B = 0;
else B = 1;

*creating var TEAM*;

if team___1 = 1 AND team___3 = 1 then TEAM = 0;
else TEAM = 1;
*error: already existed from database, 0 = no, 1 = yes*;

rename com_group___5 = error;

*creating var age_group*;

if age < 730 then age_group = 0;
else if age >= 730 and age < 4380 then age_group = 1;
else if age >= 4380 and age <= 6570 then age_group = 2;

run;

Code for GLM, which gave the results previously posted :

proc glm data=want;
class A (ref="0") error (ref="0") TEAM (ref = "0");
model LOG_Y = A TEAM error / solution clparm;
run;

Code with 3 levels of category (age_group) and interaction:

proc glm data=want;
class A(ref="0") error(ref="0") B(ref = "0") age_group(ref = "0");
model LOG_Y = A error age_group A*B/ solution clparm;
run;

PaigeMiller · Posted 05-19-2021 11:49 AM

Transforming the response variable (to achieve normality of errors?) isn't necessary for fitting a model and estimating an effect of a categorical variable.

It is necessary to perform hypothesis tests and creating confidence intervals.

--
Paige Miller

epibio2021 · Posted 05-19-2021 02:23 PM

I have conducted t-tests for all my categorical predictors (which were originally continuous variables) on the outcome. After finding significant predictors for my outcome, I need a model that accounts for my predictors to find the association between the predictors and the outcome. What would you suggest other than a multiple linear regression?

StatDave · Posted 05-19-2021 03:20 PM

For the first model you showed (a main effects only model), the exponentiated parameter estimate for any predictor is an estimate of the ratio of Y means comparing the associated level to the predictor's reference level.

For example: (mean(Y) for TEAM 1)/(mean(Y) for TEAM 0) = exp(0.49) . Expressed as a percent change: (exp(0.49)-1)*100%

Similarly for A and ERROR. It is the same for a multilevel CLASS predictor - still a comparison of the level that the parameter estimate is associated with vs the reference level. For a continuous predictor, it is the ratio for a unit change in the predictor. Notice that it is a ratio, not a difference, of means since you log-transformed your response, presumably because you are assuming that Y is log-normally distributed.

Statistical Procedures

How do I interpret a log transformed dependent variable in proc GLM with categorical predictors?

Re: How do I interpret a log transformed dependent variable in proc GLM with categorical predictors?

Re: How do I interpret a log transformed dependent variable in proc GLM with categorical predictors?

Re: How do I interpret a log transformed dependent variable in proc GLM with categorical predictors?

Re: How do I interpret a log transformed dependent variable in proc GLM with categorical predictors?

Re: How do I interpret a log transformed dependent variable in proc GLM with categorical predictors?

Follow Us

What is...

Statistical Procedures

How do I interpret a log transformed dependent variable in proc GLM with categorical predictors?

Re: How do I interpret a log transformed dependent variable in proc GLM with categorical predictors?

Re: How do I interpret a log transformed dependent variable in proc GLM with categorical predictors?

Re: How do I interpret a log transformed dependent variable in proc GLM with categorical predictors?

Re: How do I interpret a log transformed dependent variable in proc GLM with categorical predictors?

Re: How do I interpret a log transformed dependent variable in proc GLM with categorical predictors?

Our biggest data and AI event of the year.

Follow Us

What is...