New Contributor
Posts: 2

# collinearity and influential observations in PROC REG

Hi, I was wondering if someone is able to answer a few questions that I have. I want to test a logistic regression model for collinearity and influetial observations/outliers. Is it possible to put the logistic regression model in PROC REG? Do the ordinal/nominal predictors need to be dummy coded for the collineary diagnostics/ influential statistics to be accurate? any help appreciated.

Posts: 2,065

## Re: collinearity and influential observations in PROC REG

The answer is YES, you can use PROC REG for collinearity.

Collinearity doesn't depend on the Y-variables, it only depends on the X-variables, so PROC REG with random Y-values will give you the collinearity of the X-values.

Not true for inluential observations or outliers, then I don't think PROC REG will give you the proper diagnostics.

Ordinal/nominal predictors need to be dummy coded to work in PROC REG; you can use PROC GLMMOD to give you the dummy codings.

--
Paige Miller
SAS Employee
Posts: 122

## Re: collinearity and influential observations in PROC REG

Under proc REG, there is OUTPUT OUT= where you can specify DIFFITS for "standard influence of observation on predicted value". Another one is Cook's D influence statistic. Proc logistic has similar in OUTPUT OUT=, but residual based statistics is limited due to the fact your model has a binary target.

New Contributor
Posts: 2

## Re: collinearity and influential observations in PROC REG

Thank you Jason and Paige for your input it has been VERY helpful. I am also confused about another topic and I was hoping one of you might hold the answer.

I am trying to examine influential observation in some logistic regression models. I am aware that the INFLUENTIAL option and the OUTPUT OUT= options can be used to generate certain statistics such as DFBetas, Leverage(hat) values and Pearson and Deviance residuals. I am using ceratin cut-off values which I have read from a few books to subset the influential observations based on the DFBetas and Leverage(hat)values from the output from INFLUENTIAL. However, I am not sure what cut off values I can use to identify inlfuential observations for the Deviance and Pearson residuals? Are these values standardised in the outputs from INFLUENTIAL and OUTPUT OUT=  ? If so can I just use the absolute value of 2 or 3 as in standardised residuals in linear regression. I am aware that you can generate plots of the residuals aswel but I was hoping there might be a cut off value that I could use aswel.

I am also unfamiliar with the notation that comes along side them in the INFLUENTIAL output, i.e. there is a 1 unit= some value beside each statistic in the output. Are these values used to detect the influential observations and how are they used?

I apologise for the long message, but any help would be much appreciated!

thanks,

Frewen

Discussion stats
• 3 replies
• 354 views
• 0 likes
• 3 in conversation