BookmarkSubscribeRSS Feed
frewen
Calcite | Level 5

Hi, I was wondering if someone is able to answer a few questions that I have. I want to test a logistic regression model for collinearity and influetial observations/outliers. Is it possible to put the logistic regression model in PROC REG? Do the ordinal/nominal predictors need to be dummy coded for the collineary diagnostics/ influential statistics to be accurate? any help appreciated.

3 REPLIES 3
PaigeMiller
Diamond | Level 26

The answer is YES, you can use PROC REG for collinearity.

Collinearity doesn't depend on the Y-variables, it only depends on the X-variables, so PROC REG with random Y-values will give you the collinearity of the X-values.

Not true for inluential observations or outliers, then I don't think PROC REG will give you the proper diagnostics.

Ordinal/nominal predictors need to be dummy coded to work in PROC REG; you can use PROC GLMMOD to give you the dummy codings.

--
Paige Miller
JasonXin
SAS Employee

Under proc REG, there is OUTPUT OUT= where you can specify DIFFITS for "standard influence of observation on predicted value". Another one is Cook's D influence statistic. Proc logistic has similar in OUTPUT OUT=, but residual based statistics is limited due to the fact your model has a binary target.

frewen
Calcite | Level 5

Thank you Jason and Paige for your input it has been VERY helpful. I am also confused about another topic and I was hoping one of you might hold the answer.

I am trying to examine influential observation in some logistic regression models. I am aware that the INFLUENTIAL option and the OUTPUT OUT= options can be used to generate certain statistics such as DFBetas, Leverage(hat) values and Pearson and Deviance residuals. I am using ceratin cut-off values which I have read from a few books to subset the influential observations based on the DFBetas and Leverage(hat)values from the output from INFLUENTIAL. However, I am not sure what cut off values I can use to identify inlfuential observations for the Deviance and Pearson residuals? Are these values standardised in the outputs from INFLUENTIAL and OUTPUT OUT=  ? If so can I just use the absolute value of 2 or 3 as in standardised residuals in linear regression. I am aware that you can generate plots of the residuals aswel but I was hoping there might be a cut off value that I could use aswel.

I am also unfamiliar with the notation that comes along side them in the INFLUENTIAL output, i.e. there is a 1 unit= some value beside each statistic in the output. Are these values used to detect the influential observations and how are they used?

I apologise for the long message, but any help would be much appreciated!

thanks,

Frewen


sas-innovate-2024.png

Don't miss out on SAS Innovate - Register now for the FREE Livestream!

Can't make it to Vegas? No problem! Watch our general sessions LIVE or on-demand starting April 17th. Hear from SAS execs, best-selling author Adam Grant, Hot Ones host Sean Evans, top tech journalist Kara Swisher, AI expert Cassie Kozyrkov, and the mind-blowing dance crew iLuminate! Plus, get access to over 20 breakout sessions.

 

Register now!

What is ANOVA?

ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 3 replies
  • 1508 views
  • 0 likes
  • 3 in conversation