BookmarkSubscribeRSS Feed
asifgeneral2
Calcite | Level 5

Hello

 

I am referring the ANOVA and regression tutorial by SAS, and here is the code the tutor has used for identifying for potential outlier/influential obs   

 

%let interval=Gr_Liv_area Basement_area Deck_porch_area Lot_area Age_sold Bedroom_abvGr Total_bathroom;

 

ods select none;
proc glmselect data=stat1.ameshousing3 plots=all;
Stepwise model saleprice = &interval / selection=stepwise details=steps select=SL slentry=0.05 slstay=0.05;
run;
quit;
ods select all;

 

ods graphics on;
ods output RSTUDENTBYPREDICTED=Rstud
COOKSDPLOT=Cook
DFFITSPLOT=Dffits
DFBETASPANEL=Dfbs;
proc reg data=stat1.ameshousing3
plots(only label)=
(RSTUDENTBYPREDICTED COOKSD DFFITS DFBETAS);
Siglimit: model salesprice=&_GLSIND;
title 'siglimit model plots of diagnostics stats';
run;
quit;

 

My question how can I identify potential outlier and influential obs, if I am working with a binary dependent variable and using proc logistic. I have a binary dependent variable where a bad customer coded as 0 and good coded as 1. Can you please help. Thanks  

4 REPLIES 4
Rick_SAS
SAS Super FREQ

In practice, you can often use the binary response variable as the response variable in a linear regression model and it works surprisingly well. But don't tell anyone that I said that! 🙂

 

For linear regression, the influence diagnostics include the DFBETAS statistics, the DFFITS statistics, and Cook's distance (D). Some people also look at the leverage statistic (H). Similar "deletion diagnostics" statistics are available and documented in PROC LOGISTIC.

- The DFBETAS=_ALL_ option writes the DFBETAS to the output data set.

- The H= option outputs the leverage statistics 

- There are various kinds of residuals in logistic models, so I'll let you read about the other options.

 

You can use the PLOTS=INFLUENCE option on the PROC LOGISTIC statement to get plots. You can use the INFLUENCE option on the MODEL statement to display a table.

Ksharp
Super User

sDo you check the documentation of PROC LOGISTIC ,especially its example .

Check Cbar and H(Cook D) statistic.

 

proc logistic data=want outest=est(keep=intercept &varlist);
model good_bad(event='good')= &varlist
/outroc=x.roc lackfit scale=none aggregate rsquare firth corrb /* selection=stepwise sle=0.1 sls=0.1*/ ;
output out=output h=h c=c cbar=cbar predicted=PredProb;
run;

 


proc sort data=output out=check_c ;
by descending c;
run;
proc sort data=output out=check_h ;
by descending h;
run;

PaigeMiller
Diamond | Level 26

All good points from @Rick_SAS and @Ksharp .

 

I would add that DFBETAS, DFFITS and Cook's D from PROC REG really don't apply in the logistic case where the response is binary or ordinal or nominal, because these statistics from PROC REG assume you have continuous Y values, and I would not trust them if Y is not continuous. On the other hand the H (leverage) statistic does not use the value of Y, so it doesn't matter is Y is continuous or not. The other diagnostic statistics from PROC LOGISTIC that have been mentioned all use the proper estimation (maximum likelihood) for the effect on the regression line which takes into account that the response is binary or ordinal or nominal.

--
Paige Miller
asifgeneral2
Calcite | Level 5

Thank You for your help. It worked.

 

 

SAS Innovate 2025: Call for Content

Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 16. Read more here about why you should contribute and what is in it for you!

Submit your idea!

What is ANOVA?

ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 4 replies
  • 1274 views
  • 1 like
  • 4 in conversation