Hello,
I want to have an opinion on conducting a multicollinearity test using proc reg. I have been using proc reg with VIF/TOL to test for multicollinearity even for logistic regression. What would be the best approach - use the original variables? (OR) use the dummy variables? While original variables only demonstrate the VIF value for the particular variable, dummy variables include detail for the response items for each variable.
PS: I do not include the reference variable whenever I use dummy variables to check for multicollinearity.
Example of code:
proc reg data=abuse2;
model rec_abuse= rural age /*continuous*/ female signwprtnr numchild
infedu fedu pens agri othocc indincome nuclear spouse daughter relative faminc
upwork spcomu nsprvis nsprcal smknvrfmr alcnvrfmr dyexer ntrheal neveal
smorbid mmorbid nhins nhcacc idep modep sedep
/vif tol;
run;
quit;
To evaluate your model for collinearity, you need to use the same form of the model that you want to fit. If you have categorical variables that you intend to represent with dummy variables in your model, then you need to use the same dummy variables (and coded the same way). The collinearity statistics are only useful for evaluating the form of the model that you specify in REG. But note that if your actual model is a logistic, or some other generalized linear model, then you need to use appropriate weights when evaluating collinearity. This is discussed and illustrated in this note.
To evaluate your model for collinearity, you need to use the same form of the model that you want to fit. If you have categorical variables that you intend to represent with dummy variables in your model, then you need to use the same dummy variables (and coded the same way). The collinearity statistics are only useful for evaluating the form of the model that you specify in REG. But note that if your actual model is a logistic, or some other generalized linear model, then you need to use appropriate weights when evaluating collinearity. This is discussed and illustrated in this note.
Good news: We've extended SAS Hackathon registration until Sept. 12, so you still have time to be part of our biggest event yet – our five-year anniversary!
ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.
Find more tutorials on the SAS Users YouTube channel.