BookmarkSubscribeRSS Feed
ValSki
Calcite | Level 5

Hi Everyone,

First time poster here. I am using SAS Studio and have run into a puzzling finding when attempting to run a multiple linear regression (MLR) model with and without an interaction term included. In Image 1, NSIDSC is contributing to the model when consulting the parameter estimates table and in Image 2 (the analysis with the interaction term), it appears to not be contributing. 

Question is, what does this mean? Why does this happen? And, how would I interpret this?

Any help on the matter would be greatly appreciated. Thank you!

Image 1Image 1Image 2Image 2

8 REPLIES 8
PGStats
Opal | Level 21

From the little you have given us, it appears that in the first model you are estimating a single slope (-1.54) for both CCCF1 groups. In the second model, you are estimating separate slopes (-1.49, -1.60) for each group.

PG
PaigeMiller
Diamond | Level 26

We'd probably need to see the full output, and the code in order to give you a complete answer. As to "why is this happening", different models will produce different results, its as simple as that.

--
Paige Miller
ValSki
Calcite | Level 5

Hello Paige,

Thank you for replying. Here are the codes for the two analyses I was running. I've also attached the full output for the two. 

Without interaction term (image 1) and with (image 2): 

 

proc glmselect data=WORK.IMPORT outdesign(addinputvars)=Work.reg_design;
class DHH_SEX CCCF1 SPT_01 INCGHH / param=glm;
model PMHDSCR=NSIDSC DHH_SEX CCCF1 SPT_01 INCGHH / showpvalues selection=none;
run;

proc reg data=Work.reg_design alpha=0.05 plots(only)=(diagnostics residuals
observedbypredicted);
where DHH_SEX is not missing and CCCF1 is not missing and SPT_01 is not
missing and INCGHH is not missing;
ods select DiagnosticsPanel ResidualPlot ObservedByPredicted;
model PMHDSCR=&_GLSMOD /;
run;
quit;

 

proc glmselect data=WORK.IMPORT outdesign(addinputvars)=Work.reg_design;
class CCCF1 DHH_SEX SPT_01 INCGHH / param=glm;
model PMHDSCR=NSIDSC*CCCF1 NSIDSC CCCF1 DHH_SEX SPT_01 INCGHH / showpvalues
selection=none;
run;

proc reg data=Work.reg_design alpha=0.05 plots(only)=(diagnostics residuals
observedbypredicted);
where CCCF1 is not missing and DHH_SEX is not missing and SPT_01 is not
missing and INCGHH is not missing;
ods select DiagnosticsPanel ResidualPlot ObservedByPredicted;
model PMHDSCR=&_GLSMOD /;
run;
quit;

proc delete data=Work.reg_design;
run;

 

 

PaigeMiller
Diamond | Level 26

This is a great example of why I always avoid regression variable selection methods, such as PROC GLMSELECT when you have x-variables that are correlated with one another. The correlation between the x-variables means that two different models can have no resemblance to one another when you look at the parameter estimates.


So, that's the interpretation ... the correlation between the x-variables causes these models to be very different. Also, in Model 1, you have the interaction NSIDSC*CCCF1 entered into the model before the main effect of NSIDSC, and so the main effect of NSIDSC cannot be estimated. The order makes a difference here.

 

Confusing? Yes. That's where multiple linear regression takes you, to confusing-land, when you have correlated x-variables.

 

So what is the solution? I suggest a solution that does not have these drawbacks, which performs better in the case of correlation x-variables, and the above confusion is minimized. The order of the variables has no impact, and the coefficients don't swing wildly because of the presence or non-presence of a term in the model. That solution is called Partial Least Squares regreesion, which can be found in SAS PROC PLS.

--
Paige Miller
ValSki
Calcite | Level 5

All the predictors, including the interaction are run simultaneously. I tried to rearrange the order of my predictors, as I suspected the order is influencing the results but regardless of the set-up, NSIDSC still gave me a zero slope. I'll have to take a look into using proc pls instead. Hopefully, that will help. 

PGStats
Opal | Level 21

You are right that parameter estimates may change when you change the order of terms in the model. Try, for example:

 

ods select ParameterEstimates;
proc glm data=sashelp.heart plots=none;
class sex;
model weight = sex*height height sex / solution;
run; quit;

ods select ParameterEstimates;
proc glm data=sashelp.heart plots=none;
class sex;
model weight = height sex sex*height / solution;
run; quit;

The two models are mathematically equivalent. But because glm is overparameterized, some terms must be declared redundant. The two models only differ in the choice of redundant terms.

PG
PaigeMiller
Diamond | Level 26

@ValSki wrote:

All the predictors, including the interaction are run simultaneously. I tried to rearrange the order of my predictors, as I suspected the order is influencing the results but regardless of the set-up, NSIDSC still gave me a zero slope. I'll have to take a look into using proc pls instead. Hopefully, that will help. 


It did NOT give you an NSIDSC slope of zero. It said it could not estimate the slope for NSIDSC. There is a difference.

--
Paige Miller
PGStats
Opal | Level 21

Why use proc GLMSELECT (without selection and with param=GLM) and proc REG instead of proc GLM? What's the advantage?

PG

SAS Innovate 2025: Save the Date

 SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!

Save the date!

SAS Enterprise Guide vs. SAS Studio

What’s the difference between SAS Enterprise Guide and SAS Studio? How are they similar? Just ask SAS’ Danny Modlin.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 8 replies
  • 1911 views
  • 3 likes
  • 3 in conversation