Statistical Procedures

Programming the statistical procedures from SAS
BookmarkSubscribeRSS Feed
epibio2021
Obsidian | Level 7

Hi, 

I have a continuous outcome and multiple categorical/binary predictors (coded as dummy variables 0 and 1), so I used a multiple linear regression. I had to log transform the outcome variable to fit linear model assumptions. 

 

How do I go about interpreting each predictor? 

e.g. 

Parameter Estimate

Intercept    3.64

A 1             0.48

A 0             0.000000000

TEAM 1     0.49

TEAM 0     0.00

error 1        0.55

error 0        0.00

 
 

For TP, it would be (e^0.49)-1*100% = 0.63. Would my interpretation be:

a) For every unit increase in A, there is a 63% increase in Y, while team is 0 and error is 0 (holding all other variables constant). 

b) There is a 63% increase in Y for A 0 compared to A 1, while team is 0 and error is 0 (holding all other variables constant.) 

Because this predictor is binary, I don't know if if "for every unit increase" is appropriate. 

 

Would the interpretation be the same for the other predictors? 

For TEAM, (e^0.49)-1*100% = 0.62. 

a) For every unit increase in TEAM, there is a 62% increase in Y, while A is 0 and error is 0 (holding all other variables constant). 

b) There is a 63% increase in Y for TEAM 0 compared to TEAM 1, while team is 0 and error is 0 (holding all other variables constant.) 

 

How would the interpretation be if the categorical predictor had 3 levels (0, 1, 2)? Or if there was an interaction in the predictors TEAM*A? 

 

Thank you for your help. 

5 REPLIES 5
ballardw
Super User

It helps to start by showing the SAS code used with GLM. That will give us some chance of answering the question.

 

The behavior can change somewhat depending on if you actually created a slew of dummy variables or used the CLASS statement on GLM and how you did the "Log transform".

epibio2021
Obsidian | Level 7

This is the code where the variables were created:

data want;
set have;

LOG_Y = log(Y);

*creating var A*;
if doc <= '01JUN2019:00:00:00'dt then A = 0;
else if doc => '01JUN2019:00:00:00'dt then A = 1;

*creating var B*; 

if time >= '07:00:00't and time <= '18:59:59't then B = 0;
else B = 1;

*creating var TEAM*; 

if team___1 = 1 AND team___3 = 1 then TEAM = 0;
else TEAM = 1;
*error: already existed from database, 0 = no, 1 = yes*; 

rename com_group___5 = error;

*creating var age_group*; 

if age < 730 then age_group = 0;
else if age >= 730 and age < 4380 then age_group = 1;
else if age >= 4380 and age <= 6570 then age_group = 2;  

run; 

 

Code for GLM, which gave the results previously posted :

proc glm data=want;
class A (ref="0") error (ref="0") TEAM (ref = "0");
model LOG_Y = A TEAM error / solution clparm;
run; 

 

Code with 3 levels of category (age_group) and interaction: 

proc glm data=want;
class A(ref="0") error(ref="0") B(ref = "0") age_group(ref = "0");
model LOG_Y = A error age_group A*B/ solution clparm;
run;

 

PaigeMiller
Diamond | Level 26

Transforming the response variable (to achieve normality of errors?) isn't necessary for fitting a model and estimating an effect of a categorical variable.

 

It is necessary to perform hypothesis tests and creating confidence intervals.

--
Paige Miller
epibio2021
Obsidian | Level 7
I have conducted t-tests for all my categorical predictors (which were originally continuous variables) on the outcome. After finding significant predictors for my outcome, I need a model that accounts for my predictors to find the association between the predictors and the outcome. What would you suggest other than a multiple linear regression?
StatDave
SAS Super FREQ

For the first model you showed (a main effects only model), the exponentiated parameter estimate for any predictor is an estimate of the ratio of Y means comparing the associated level to the predictor's reference level.

 

For example: (mean(Y)  for TEAM 1)/(mean(Y) for TEAM 0) = exp(0.49) .  Expressed as a percent change: (exp(0.49)-1)*100%

 

Similarly for A and ERROR. It is the same for a multilevel CLASS predictor - still a comparison of the level that the parameter estimate is associated with vs the reference level. For a continuous predictor, it is the ratio for a unit change in the predictor. Notice that it is a ratio, not a difference, of means since you log-transformed your response, presumably because you are assuming that Y is log-normally distributed.

 

sas-innovate-white.png

Our biggest data and AI event of the year.

Don’t miss the livestream kicking off May 7. It’s free. It’s easy. And it’s the best seat in the house.

Join us virtually with our complimentary SAS Innovate Digital Pass. Watch live or on-demand in multiple languages, with translations available to help you get the most out of every session.

 

Register now!

What is ANOVA?

ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 5 replies
  • 2407 views
  • 1 like
  • 4 in conversation