BookmarkSubscribeRSS Feed
☑ This topic is solved. Need further help from the community? Please sign in and ask a new question.
bhr-q
Pyrite | Level 9

Hello all,

 

Is there any way to check multicollinearity using proc GLM?

 

proc glm data=tmp  ; 
class study CNS stk_loc sex;
model fim =study sex age CNS stk_loc /solution ss3 ;
run; 

 

The dependent variable is continuous

All the independent variables are categorical except age which is continuous

 

I am familiar with VIF in proc reg, but needed to create dummy variables.

 

any input appreciated

1 ACCEPTED SOLUTION

Accepted Solutions
Ksharp
Super User

1) You could use PROC GLMSELECT to eliminate these multicollinearity  variables.

 

2)You could use PROC GENMOD + CORRB option to check the correlation between these estiamte coefficient.

proc genmod data=sashelp.heart  ; 
class status bp_Status sex;
model weight =status bp_Status sex height / corrb ;
quit; 

Ksharp_0-1726019102288.png

 

View solution in original post

8 REPLIES 8
PaigeMiller
Diamond | Level 26

Use PROC GLMMOD to obtain the x matrix used by PROC GLM. Then run PROC REG with the VIF option, using the output of PROC GLMMOD as input to PROC REG.

 

Or maybe this: https://stackoverflow.com/questions/77531415/sas-basic-analysis-problems

--
Paige Miller
sbxkoenk
SAS Super FREQ

Concurring with @PaigeMiller .

 

  • In PROC GLMMOD , you can use the OUTDESIGN= option.
  • In PROC GLMMOD , you can use ODS to create the design matrix data set.

The results are equivalent, but the columns of the data set produced by ODS have names that are directly related to the names of their corresponding effects.

 

Here's an example of the latter option:

SAS Help Center: Example 49.2 Factorial Screening

 

ods output DesignPoints = DesignMatrix;
proc glmmod data=Screening;
   model y = a|b|c|d|e@2;
run;

proc reg data=DesignMatrix;
   model y = a--d_e;
   model y = a--d_e / selection = forward
                      details   = summary
                      slentry   = 0.05;
run;
QUIT;

Koen

bhr-q
Pyrite | Level 9
Thanks for your answer, it was helpful.
bhr-q
Pyrite | Level 9
Thanks for your answer, it was interesting to get the tolerance using Proc GLM, but when I used Proc GLM with the tolerance option it didn't show me any tolerance.
sbxkoenk
SAS Super FREQ

@bhr-q wrote:
Thanks for your answer, it was interesting to get the tolerance using Proc GLM, but when I used Proc GLM with the tolerance option it didn't show me any tolerance.

Weird.

Here's some info on the tolerance output in PROC GLM :

  • The Type 1 tolerance of a parameter is the tolerance for this parameter with respect to the preceding parameters in the model.
  • The Type 2 tolerance of a parameter is the tolerance for this parameter with respect to all other parameters in the model.

The TYPE 2 Tolerance is consistent with the TOL option on the MODEL statement in PROC REG, which is 1/VIF.

It is your choice which tolerance to use.

Koen

Ksharp
Super User

1) You could use PROC GLMSELECT to eliminate these multicollinearity  variables.

 

2)You could use PROC GENMOD + CORRB option to check the correlation between these estiamte coefficient.

proc genmod data=sashelp.heart  ; 
class status bp_Status sex;
model weight =status bp_Status sex height / corrb ;
quit; 

Ksharp_0-1726019102288.png

 

PaigeMiller
Diamond | Level 26

@Ksharp wrote:

1) You could use PROC GLMSELECT to eliminate these multicollinearity  variables.

 

2)You could use PROC GENMOD + CORRB option to check the correlation between these estiamte coefficient.

proc genmod data=sashelp.heart  ; 
class status bp_Status sex;
model weight =status bp_Status sex height / corrb ;
quit; 

Ksharp_0-1726019102288.png

 


 

The problem with CORRB is that it only looks for pairwise correlations. Maybe for some data sets, that's fine but it will miss more complicated types of correlations. VIF (and Tolerance) looks for correlation with the combination of all other parameters in the model.

--
Paige Miller
Ksharp
Super User

Paige,
I know what you are talking about (linear combination of multiple variables).
But I don't think there is a problem by deleting/detecting a variable one by one(the linear combination of multiple variables must be high correlated with one of these variables ).
If you are not agree with that , you could try PROC GLMSELECT or HPGENSELECT that would take care of your consideration.

SAS Innovate 2025: Save the Date

 SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!

Save the date!

What is ANOVA?

ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 8 replies
  • 1244 views
  • 6 likes
  • 4 in conversation