Auto-suggest helps you quickly narrow down your search results by suggesting possible matches as you type.

Showing results for

- Home
- /
- Analytics
- /
- Stat Procs
- /
- how to pick outlier /influential obs

- RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Mute
- Printer Friendly Page

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

Posted 06-17-2019 11:58 PM
(1107 views)

Hello

I am referring the **ANOVA and regression tutorial by SAS**, and here is the code the tutor has used for identifying for potential outlier/influential obs

%let interval=Gr_Liv_area Basement_area Deck_porch_area Lot_area Age_sold Bedroom_abvGr Total_bathroom;

ods select none;

proc glmselect data=stat1.ameshousing3 plots=all;

Stepwise model saleprice = &interval / selection=stepwise details=steps select=SL slentry=0.05 slstay=0.05;

run;

quit;

ods select all;

ods graphics on;

ods output RSTUDENTBYPREDICTED=Rstud

COOKSDPLOT=Cook

DFFITSPLOT=Dffits

DFBETASPANEL=Dfbs;

proc reg data=stat1.ameshousing3

plots(only label)=

(RSTUDENTBYPREDICTED COOKSD DFFITS DFBETAS);

Siglimit: model salesprice=&_GLSIND;

title 'siglimit model plots of diagnostics stats';

run;

quit;

**My question how can I identify potential outlier and influential obs, if I am working with a binary dependent variable and using proc logistic. I have a binary dependent variable where a bad customer coded as 0 and good coded as 1. Can you please help. Thanks **

4 REPLIES 4

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

In practice, you can often use the binary response variable as the response variable in a linear regression model and it works surprisingly well. But don't tell anyone that I said that! 🙂

For linear regression, the influence diagnostics include the DFBETAS statistics, the DFFITS statistics, and Cook's distance (D). Some people also look at the leverage statistic (H). Similar "deletion diagnostics" statistics are available and documented in PROC LOGISTIC.

- The DFBETAS=_ALL_ option writes the DFBETAS to the output data set.

- The H= option outputs the leverage statistics

- There are various kinds of residuals in logistic models, so I'll let you read about the other options.

You can use the PLOTS=INFLUENCE option on the PROC LOGISTIC statement to get plots. You can use the INFLUENCE option on the MODEL statement to display a table.

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

sDo you check the documentation of PROC LOGISTIC ,especially its example .

Check Cbar and H(Cook D) statistic.

proc logistic data=want outest=est(keep=intercept &varlist);

model good_bad(event='good')= &varlist

/outroc=x.roc lackfit scale=none aggregate rsquare firth corrb /* selection=stepwise sle=0.1 sls=0.1*/ ;

output out=output **h=h c=c** cbar=cbar predicted=PredProb;

run;

proc sort data=output out=check_c ;

by **descending c**;

run;

proc sort data=output out=check_h ;

by descending h;

run;

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

All good points from @Rick_SAS and @Ksharp .

I would add that DFBETAS, DFFITS and Cook's D from PROC REG really don't apply in the logistic case where the response is binary or ordinal or nominal, because these statistics from PROC REG assume you have continuous Y values, and I would not trust them if Y is not continuous. On the other hand the H (leverage) statistic does not use the value of Y, so it doesn't matter is Y is continuous or not. The other diagnostic statistics from PROC LOGISTIC that have been mentioned all use the proper estimation (maximum likelihood) for the effect on the regression line which takes into account that the response is binary or ordinal or nominal.

--

Paige Miller

Paige Miller

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

Thank You for your help. It worked.

**Available on demand!**

Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.

What is ANOVA?

ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.

Find more tutorials on the SAS Users YouTube channel.