BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
Yan
Calcite | Level 5 Yan
Calcite | Level 5
I am running Proc Reg to check multicollinearity for logistic regression models. Almost all the independent variables are categorical variables. I constructed dummy variables and put K-1 dummies in Proc Reg models. For collinearity diagnosis in Proc Reg, there are two options, COLLIN and COLLINOINT. I am wondering if I use the same model for these two options as the later will exclude the intecept from calculation. Should I put all dummies rather than k-1 dummies while using COLLINOINT option? Thanks!
1 ACCEPTED SOLUTION

Accepted Solutions
Dale
Pyrite | Level 9
With more than one categorical variable, I would run the collinearity diagnostics using k{i}-1 dummy variables for the i-th categorical variable AND I would include the intercept. By using k{i}-1 dummy variables for the i-th categorical variable, you do not overparameterize the model with the reference level for any of your categorical variables. Inclusion of the intercept along with the k{i} - 1 dummy variables also does not result in an overparameterized model.

If you were to use k{i} dummy variables for each categorical variable and you have two or more categorical variables, then you will end up with an overparameterized model. So, it is best to use k{i}-1 dummy variables and include the intercept.

View solution in original post

1 REPLY 1
Dale
Pyrite | Level 9
With more than one categorical variable, I would run the collinearity diagnostics using k{i}-1 dummy variables for the i-th categorical variable AND I would include the intercept. By using k{i}-1 dummy variables for the i-th categorical variable, you do not overparameterize the model with the reference level for any of your categorical variables. Inclusion of the intercept along with the k{i} - 1 dummy variables also does not result in an overparameterized model.

If you were to use k{i} dummy variables for each categorical variable and you have two or more categorical variables, then you will end up with an overparameterized model. So, it is best to use k{i}-1 dummy variables and include the intercept.

sas-innovate-2024.png

Don't miss out on SAS Innovate - Register now for the FREE Livestream!

Can't make it to Vegas? No problem! Watch our general sessions LIVE or on-demand starting April 17th. Hear from SAS execs, best-selling author Adam Grant, Hot Ones host Sean Evans, top tech journalist Kara Swisher, AI expert Cassie Kozyrkov, and the mind-blowing dance crew iLuminate! Plus, get access to over 20 breakout sessions.

 

Register now!

What is ANOVA?

ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 1 reply
  • 5325 views
  • 0 likes
  • 2 in conversation