Hi Everyone,
First time poster here. I am using SAS Studio and have run into a puzzling finding when attempting to run a multiple linear regression (MLR) model with and without an interaction term included. In Image 1, NSIDSC is contributing to the model when consulting the parameter estimates table and in Image 2 (the analysis with the interaction term), it appears to not be contributing.
Question is, what does this mean? Why does this happen? And, how would I interpret this?
Any help on the matter would be greatly appreciated. Thank you!
From the little you have given us, it appears that in the first model you are estimating a single slope (-1.54) for both CCCF1 groups. In the second model, you are estimating separate slopes (-1.49, -1.60) for each group.
We'd probably need to see the full output, and the code in order to give you a complete answer. As to "why is this happening", different models will produce different results, its as simple as that.
Hello Paige,
Thank you for replying. Here are the codes for the two analyses I was running. I've also attached the full output for the two.
Without interaction term (image 1) and with (image 2):
proc glmselect data=WORK.IMPORT outdesign(addinputvars)=Work.reg_design;
class DHH_SEX CCCF1 SPT_01 INCGHH / param=glm;
model PMHDSCR=NSIDSC DHH_SEX CCCF1 SPT_01 INCGHH / showpvalues selection=none;
run;
proc reg data=Work.reg_design alpha=0.05 plots(only)=(diagnostics residuals
observedbypredicted);
where DHH_SEX is not missing and CCCF1 is not missing and SPT_01 is not
missing and INCGHH is not missing;
ods select DiagnosticsPanel ResidualPlot ObservedByPredicted;
model PMHDSCR=&_GLSMOD /;
run;
quit;
proc glmselect data=WORK.IMPORT outdesign(addinputvars)=Work.reg_design;
class CCCF1 DHH_SEX SPT_01 INCGHH / param=glm;
model PMHDSCR=NSIDSC*CCCF1 NSIDSC CCCF1 DHH_SEX SPT_01 INCGHH / showpvalues
selection=none;
run;
proc reg data=Work.reg_design alpha=0.05 plots(only)=(diagnostics residuals
observedbypredicted);
where CCCF1 is not missing and DHH_SEX is not missing and SPT_01 is not
missing and INCGHH is not missing;
ods select DiagnosticsPanel ResidualPlot ObservedByPredicted;
model PMHDSCR=&_GLSMOD /;
run;
quit;
proc delete data=Work.reg_design;
run;
This is a great example of why I always avoid regression variable selection methods, such as PROC GLMSELECT when you have x-variables that are correlated with one another. The correlation between the x-variables means that two different models can have no resemblance to one another when you look at the parameter estimates.
So, that's the interpretation ... the correlation between the x-variables causes these models to be very different. Also, in Model 1, you have the interaction NSIDSC*CCCF1 entered into the model before the main effect of NSIDSC, and so the main effect of NSIDSC cannot be estimated. The order makes a difference here.
Confusing? Yes. That's where multiple linear regression takes you, to confusing-land, when you have correlated x-variables.
So what is the solution? I suggest a solution that does not have these drawbacks, which performs better in the case of correlation x-variables, and the above confusion is minimized. The order of the variables has no impact, and the coefficients don't swing wildly because of the presence or non-presence of a term in the model. That solution is called Partial Least Squares regreesion, which can be found in SAS PROC PLS.
All the predictors, including the interaction are run simultaneously. I tried to rearrange the order of my predictors, as I suspected the order is influencing the results but regardless of the set-up, NSIDSC still gave me a zero slope. I'll have to take a look into using proc pls instead. Hopefully, that will help.
You are right that parameter estimates may change when you change the order of terms in the model. Try, for example:
ods select ParameterEstimates;
proc glm data=sashelp.heart plots=none;
class sex;
model weight = sex*height height sex / solution;
run; quit;
ods select ParameterEstimates;
proc glm data=sashelp.heart plots=none;
class sex;
model weight = height sex sex*height / solution;
run; quit;
The two models are mathematically equivalent. But because glm is overparameterized, some terms must be declared redundant. The two models only differ in the choice of redundant terms.
@ValSki wrote:
All the predictors, including the interaction are run simultaneously. I tried to rearrange the order of my predictors, as I suspected the order is influencing the results but regardless of the set-up, NSIDSC still gave me a zero slope. I'll have to take a look into using proc pls instead. Hopefully, that will help.
It did NOT give you an NSIDSC slope of zero. It said it could not estimate the slope for NSIDSC. There is a difference.
Why use proc GLMSELECT (without selection and with param=GLM) and proc REG instead of proc GLM? What's the advantage?
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
What’s the difference between SAS Enterprise Guide and SAS Studio? How are they similar? Just ask SAS’ Danny Modlin.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.