09-25-2013 07:28 AM
Hi, I was wondering if someone is able to answer a few questions that I have. I want to test a logistic regression model for collinearity and influetial observations/outliers. Is it possible to put the logistic regression model in PROC REG? Do the ordinal/nominal predictors need to be dummy coded for the collineary diagnostics/ influential statistics to be accurate? any help appreciated.
09-25-2013 11:47 AM
The answer is YES, you can use PROC REG for collinearity.
Collinearity doesn't depend on the Y-variables, it only depends on the X-variables, so PROC REG with random Y-values will give you the collinearity of the X-values.
Not true for inluential observations or outliers, then I don't think PROC REG will give you the proper diagnostics.
Ordinal/nominal predictors need to be dummy coded to work in PROC REG; you can use PROC GLMMOD to give you the dummy codings.
09-29-2013 04:23 PM
Under proc REG, there is OUTPUT OUT= where you can specify DIFFITS for "standard influence of observation on predicted value". Another one is Cook's D influence statistic. Proc logistic has similar in OUTPUT OUT=, but residual based statistics is limited due to the fact your model has a binary target.
10-11-2013 05:00 AM
Thank you Jason and Paige for your input it has been VERY helpful. I am also confused about another topic and I was hoping one of you might hold the answer.
I am trying to examine influential observation in some logistic regression models. I am aware that the INFLUENTIAL option and the OUTPUT OUT= options can be used to generate certain statistics such as DFBetas, Leverage(hat) values and Pearson and Deviance residuals. I am using ceratin cut-off values which I have read from a few books to subset the influential observations based on the DFBetas and Leverage(hat)values from the output from INFLUENTIAL. However, I am not sure what cut off values I can use to identify inlfuential observations for the Deviance and Pearson residuals? Are these values standardised in the outputs from INFLUENTIAL and OUTPUT OUT= ? If so can I just use the absolute value of 2 or 3 as in standardised residuals in linear regression. I am aware that you can generate plots of the residuals aswel but I was hoping there might be a cut off value that I could use aswel.
I am also unfamiliar with the notation that comes along side them in the INFLUENTIAL output, i.e. there is a 1 unit= some value beside each statistic in the output. Are these values used to detect the influential observations and how are they used?
I apologise for the long message, but any help would be much appreciated!