BookmarkSubscribeRSS Feed
krakla
Fluorite | Level 6

Hello;

I have a question related to proc mixed procedure. I have a model as the following

 

proc mixed data = result covtest method = reml;
class student school course exam time;
model res = time exam course time*exam time*course time*exam*course / s ; 
repeated /subject = student(exam) type = un;
random 	school;
where res ne .;
run;

Now, I'd like to adapt the code for the mixed model such that insignificant interactions are dropped 1 by 1 when p > 0.05. This will lead to 4 models in total. Any advice?

 

Regards

14 REPLIES 14
PaigeMiller
Diamond | Level 26

If it is just 4 models, I would do this one-at-a-time by manually removing the unwanted interactions from the code, and then running the code. If the real world problem has (for example) 50 interactions, you could write a macro to do this.

--
Paige Miller
krakla
Fluorite | Level 6
Many thanks PaigeMiller for your advice. Indeed, I know that I can remove the interactions and add them one by one. I need help to do that using macro. I still learning macro (beginner) and that's why I can't do it yet. So, I hope if there is someone who can help me with that.
PaigeMiller
Diamond | Level 26

I need help to do that using macro.

Why? Are there really a lot more than 3 interactions?

 

 

--
Paige Miller
krakla
Fluorite | Level 6
Thank you paigeMiller. Yeah, there are about 42 interactions and that's why I tried to make my question more simple. If you can help, I will be thankful.
Regards
PaigeMiller
Diamond | Level 26

In my mind, I can't possibly imagine what you are going to do after you run such a macro and now have 42 different models to compare.


The real problem that it seems you are dealing with is multicollinearity effects on the model estimates when using ML or REML estimation. I'm not sure what tools SAS provides to help with this. My guess is that you would need to run the FIXED part of the model through either PROC GLMSELECT or PROC PLS (or both) to determine how to handle the multicollinearity and selecting a model for the fixed part, then go back to PROC MIXED and perform the REML estimation (using the fixed model found). But I certainly would defer to others (perhaps @Rick_SAS @jiltao @SteveDenham @StatDave ) on how to handle multicollinearity for maximum likelihood estimation.

--
Paige Miller
Rick_SAS
SAS Super FREQ

This sounds like model selection using a "backwards selection" method in which you start with the full model and then drop effects that are least significant. This is one possible way to select effects from a large set of candidates. If you do an internet search for 

+sas variable selection mixed models 


you will find many papers that use macros for variable selection in mixed models. I have not used any of them. However, I know the work of George Fernandez and Jorge Morel, so you might start with their papers:

PaigeMiller
Diamond | Level 26

@Rick_SAS wrote:

This sounds like model selection using a "backwards selection" method in which you start with the full model and then drop effects that are least significant. This is one possible way to select effects from a large set of candidates. If you do an internet search for 

+sas variable selection mixed models 


you will find many papers that use macros for variable selection in mixed models. I have not used any of them. However, I know the work of George Fernandez and Jorge Morel, so you might start with their papers:


I'm going to have to read these papers as well! Thanks @Rick_SAS !

--
Paige Miller
PaigeMiller
Diamond | Level 26

Unfortunately, the actual macro code for these papers mentioned by @Rick_SAS doesn't seem to be available 😞

--
Paige Miller
PaigeMiller
Diamond | Level 26

Thanks, @Rick_SAS !

 

I did see that, I haven't looked at the macro yet. I am always skeptical about using any form of stepwise, but from the diagram it seems as if this macro has many forms of pre-checking the variables before modeling and checking the resulting model which I want to understand.

--
Paige Miller
krakla
Fluorite | Level 6
Thank you. I am going to search for backward elimination using SAS and I think this will help. Also, I will read those papers you mentioned here and see what will happen.
SteveDenham
Jade | Level 19

Why do you want to remove them?  I suppose that it might be because the data comes from an observational study rather than a designed experiment?  Before doing this please read this paper

 

https://www.lexjansen.com/pnwsug/2008/DavidCassell-StoppingStepwise.pdf 

 

Or Frank Harrell's Regression Modeling Strategies.

 

If you must reduce the fixed effects, try a LASSO method or elastic net - ignoring the random effects.  But the best strategy is using prior knowledge of the system that generated your data to eliminate effects that are either a)known to be irrelevant (like fourth and fifth order interactions or b)are not of interest to your research question.

 

If you are looking for a purely predictive model, a classification and regression tree analysis may be what you need.

 

SteveDenham

krakla
Fluorite | Level 6
Indeed, yes the data is from an observational study. I know about LASSO, but I didn't think it would help because I am looking for a predictive model. I am going to read what you mentioned. Thanks a lot.
PaigeMiller
Diamond | Level 26

LASSO helps you determine the proper predictive model. LASSO and Stepwise/Forward/Backward selection does not work with PROC MIXED, unless you adopt the methods from the papers linked to by @Rick_SAS . If you really want to do something with Stepwise/Forward/Backward selection (as your original question implies), then definitely read the paper linked by @SteveDenham. Lots of smart people have put in lots of work on this problem.

 

This paper "PLS Generalized Regression" also ought to work here, but as far as I know, there is no SAS code for it although there is an R-package that will perform this type of analysis.

--
Paige Miller

SAS Innovate 2025: Call for Content

Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!

Submit your idea!

What is ANOVA?

ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 14 replies
  • 1908 views
  • 4 likes
  • 4 in conversation