turn on suggestions

Auto-suggest helps you quickly narrow down your search results by suggesting possible matches as you type.

Showing results for

Find a Community

- Home
- /
- Analytics
- /
- Stat Procs
- /
- collinearity and influential observations in PROC ...

Topic Options

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

09-25-2013 07:28 AM

Hi, I was wondering if someone is able to answer a few questions that I have. I want to test a logistic regression model for collinearity and influetial observations/outliers. Is it possible to put the logistic regression model in PROC REG? Do the ordinal/nominal predictors need to be dummy coded for the collineary diagnostics/ influential statistics to be accurate? any help appreciated.

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Posted in reply to frewen

09-25-2013 11:47 AM

The answer is YES, you can use PROC REG for collinearity.

Collinearity doesn't depend on the Y-variables, it only depends on the X-variables, so PROC REG with random Y-values will give you the collinearity of the X-values.

Not true for inluential observations or outliers, then I don't think PROC REG will give you the proper diagnostics.

Ordinal/nominal predictors need to be dummy coded to work in PROC REG; you can use PROC GLMMOD to give you the dummy codings.

--

Paige Miller

Paige Miller

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Posted in reply to PaigeMiller

09-29-2013 04:23 PM

Under proc REG, there is OUTPUT OUT= where you can specify DIFFITS for "standard influence of observation on predicted value". Another one is Cook's D influence statistic. Proc logistic has similar in OUTPUT OUT=, but residual based statistics is limited due to the fact your model has a binary target.

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Posted in reply to frewen

10-11-2013 05:00 AM

Thank you Jason and Paige for your input it has been VERY helpful. I am also confused about another topic and I was hoping one of you might hold the answer.

I am trying to examine influential observation in some logistic regression models. I am aware that the INFLUENTIAL option and the OUTPUT OUT= options can be used to generate certain statistics such as DFBetas, Leverage(hat) values and Pearson and Deviance residuals. I am using ceratin cut-off values which I have read from a few books to subset the influential observations based on the DFBetas and Leverage(hat)values from the output from INFLUENTIAL. However, I am not sure what cut off values I can use to identify inlfuential observations for the Deviance and Pearson residuals? Are these values standardised in the outputs from INFLUENTIAL and OUTPUT OUT= ? If so can I just use the absolute value of 2 or 3 as in standardised residuals in linear regression. I am aware that you can generate plots of the residuals aswel but I was hoping there might be a cut off value that I could use aswel.

I am also unfamiliar with the notation that comes along side them in the INFLUENTIAL output, i.e. there is a 1 unit= some value beside each statistic in the output. Are these values used to detect the influential observations and how are they used?

I apologise for the long message, but any help would be much appreciated!

thanks,

Frewen