BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
Babloo
Rhodochrosite | Level 12

I would like to detect a outliers and multicollinearity for my regression (both linear and logistic) analysis. Appreciate if someone guide me through options/procs for that.

 

Thanks in advance!

1 ACCEPTED SOLUTION

Accepted Solutions
Ksharp
Super User
For proc reg :
Outliers  - Check Cook Distance
Multicollinearity  - Check VIF   ( model y=x / vif )

For proc logistic:
Outliers  - Check INFLUENCE option and see Chi-square :

proc logistic data=sashelp.class;
model sex=age height/ influence;
run;


Regression Diagnostics
Case
Number	Covariates	Pearson Residual	Deviance Residual	Hat Matrix Diagonal	Intercept DfBeta	Age DfBeta	Height DfBeta	Confidence Interval Displacement C	Confidence Interval Displacement CBar	Delta Deviance	Delta Chi-Square
Age	Height
1	14.0000	69.0000	-0.3158	-0.4361	0.1443	0.1096	0.0994	-0.1265	0.0197	0.0168	0.2070	0.1166
2	13.0000	56.5000	0.3559	0.4883	0.1644	0.1391	0.1271	-0.1562	0.0298	0.0249	0.2634	0.1515
3	13.0000	65.3000	2.4687	1.9796	0.1484	-0.6318	-0.9210	0.9706	1.2470	1.0620	4.9807	7.1566
4	14.0000	62.8000	0.8089	1.0034	0.0999	0.0598	0.1630	-0.1330	0.0807	0.0726	1.0794	0.7270





Multicollinearity  - There is not check in proc logistic, but sas will remove one variable automatically if it is colinearity with other
                             variables due to proc logistic is using MLE .


View solution in original post

6 REPLIES 6
Reeza
Super User

What's your definition of an outlier? 

Babloo
Rhodochrosite | Level 12
I don't have any definition for outlier. Just I want to see the
observations which is far away from normal distribution.

##- Please type your reply above this line. Simple formatting, no
attachments. -##
Reeza
Super User

Your questions are too broad. They're chapters in text books. 

 

If your trying to learn statistical theory and SAS have you taken the first statistic e-course from SAS? It's free. 

There's also a ton of videos on topics related to specific statistical procedures. 

 

http://support.sas.com/training/tutorial/

 

 

 

PGStats
Opal | Level 21

Suggestions:

 

  • Make that two separate topics (questions)
  • Formulate each question as a problem such as "I have the following dataset and would like to know if obs 12 is an outlier relative to this regression model, how can I do that?"
PG
Rick_SAS
SAS Super FREQ

For linear regression you can use the ROBUSTREG procedure. The procedure has algorithms that automatically flag outliers.  The documentation contains several Getting Started examples. I suggest you start with the examples and then move on to the "Details" section if you want to understand the details about how an observation is classified as an outlier. 

 

There is not an analogous "robust" procedure for logistic regression. However, there are still techniques for detecting potential outliers in almost every SAS procedure. The technique is to use regression diagonostic plots.

 

For example, in PROC REG you can use the INFLUENCE option on the MODEL statement and look at the ODS graphics to assess observations that are highly influential in the model. See the section of the doc titled "Influence Statstics".

 

You can do something similar for logistic regression. The LOGISTIC procedure contains many diagnostic plots.  As Reeze says, a full explanation is lengthy, but start with the doc example "Logistic Regression diagnostics", which shows how to use the INFLUENCE option and the diagnostic plots.

Ksharp
Super User
For proc reg :
Outliers  - Check Cook Distance
Multicollinearity  - Check VIF   ( model y=x / vif )

For proc logistic:
Outliers  - Check INFLUENCE option and see Chi-square :

proc logistic data=sashelp.class;
model sex=age height/ influence;
run;


Regression Diagnostics
Case
Number	Covariates	Pearson Residual	Deviance Residual	Hat Matrix Diagonal	Intercept DfBeta	Age DfBeta	Height DfBeta	Confidence Interval Displacement C	Confidence Interval Displacement CBar	Delta Deviance	Delta Chi-Square
Age	Height
1	14.0000	69.0000	-0.3158	-0.4361	0.1443	0.1096	0.0994	-0.1265	0.0197	0.0168	0.2070	0.1166
2	13.0000	56.5000	0.3559	0.4883	0.1644	0.1391	0.1271	-0.1562	0.0298	0.0249	0.2634	0.1515
3	13.0000	65.3000	2.4687	1.9796	0.1484	-0.6318	-0.9210	0.9706	1.2470	1.0620	4.9807	7.1566
4	14.0000	62.8000	0.8089	1.0034	0.0999	0.0598	0.1630	-0.1330	0.0807	0.0726	1.0794	0.7270





Multicollinearity  - There is not check in proc logistic, but sas will remove one variable automatically if it is colinearity with other
                             variables due to proc logistic is using MLE .


sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

What is ANOVA?

ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 6 replies
  • 5036 views
  • 0 likes
  • 5 in conversation