BookmarkSubscribeRSS Feed
frewen
Calcite | Level 5

Hi, I was wondering if someone is able to answer a few questions that I have. I want to test a logistic regression model for collinearity and influetial observations/outliers. Is it possible to put the logistic regression model in PROC REG? Do the ordinal/nominal predictors need to be dummy coded for the collineary diagnostics/ influential statistics to be accurate? any help appreciated.

3 REPLIES 3
PaigeMiller
Diamond | Level 26

The answer is YES, you can use PROC REG for collinearity.

Collinearity doesn't depend on the Y-variables, it only depends on the X-variables, so PROC REG with random Y-values will give you the collinearity of the X-values.

Not true for inluential observations or outliers, then I don't think PROC REG will give you the proper diagnostics.

Ordinal/nominal predictors need to be dummy coded to work in PROC REG; you can use PROC GLMMOD to give you the dummy codings.

--
Paige Miller
JasonXin
SAS Employee

Under proc REG, there is OUTPUT OUT= where you can specify DIFFITS for "standard influence of observation on predicted value". Another one is Cook's D influence statistic. Proc logistic has similar in OUTPUT OUT=, but residual based statistics is limited due to the fact your model has a binary target.

frewen
Calcite | Level 5

Thank you Jason and Paige for your input it has been VERY helpful. I am also confused about another topic and I was hoping one of you might hold the answer.

I am trying to examine influential observation in some logistic regression models. I am aware that the INFLUENTIAL option and the OUTPUT OUT= options can be used to generate certain statistics such as DFBetas, Leverage(hat) values and Pearson and Deviance residuals. I am using ceratin cut-off values which I have read from a few books to subset the influential observations based on the DFBetas and Leverage(hat)values from the output from INFLUENTIAL. However, I am not sure what cut off values I can use to identify inlfuential observations for the Deviance and Pearson residuals? Are these values standardised in the outputs from INFLUENTIAL and OUTPUT OUT=  ? If so can I just use the absolute value of 2 or 3 as in standardised residuals in linear regression. I am aware that you can generate plots of the residuals aswel but I was hoping there might be a cut off value that I could use aswel.

I am also unfamiliar with the notation that comes along side them in the INFLUENTIAL output, i.e. there is a 1 unit= some value beside each statistic in the output. Are these values used to detect the influential observations and how are they used?

I apologise for the long message, but any help would be much appreciated!

thanks,

Frewen


sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

What is ANOVA?

ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 3 replies
  • 1495 views
  • 0 likes
  • 3 in conversation