Turn on suggestions

Auto-suggest helps you quickly narrow down your search results by suggesting possible matches as you type.

Showing results for

- Home
- /
- Analytics
- /
- Stat Procs
- /
- Detect extreme values through leverage and outlier via robust regressi...

Options

- RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Mute
- Printer Friendly Page

🔒 This topic is **solved** and **locked**.
Need further help from the community? Please
sign in and ask a **new** question.

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

Posted 02-05-2021 04:01 PM
(1336 views)

Hello all,

I have a multivariable data set my response is 2 categorical variable(good, bad) and all independent variables are numerical and 60 observations. Because my response is categorical and non numerical, can i apply proc robust to detect influential points using this sas code ?

```
data mydata;
set mydata;
y=ranuni(3);
run;
proc robustreg data=mydata method=lts;
model y = t1-t7 / diagnostics leverage;
run;
```

```
proc logistic data=mydataset descending;
model Y=var1 var2 var3 var4 var5 var6 var7/ plcl plrl waldcl waldrl
lackfit rsq
influence iplots
itprint;
ods output influence=myinfluence;
run;
```

can i change the cut off value based on 90 percentile for detecting outlier and leverage applying logistic regression and robust regression?

Any help will be appreciated.

1 ACCEPTED SOLUTION

Accepted Solutions

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

First, make sure you have read the last section of this article about PROC ROBUSTREG: "Detecting outliers in SAS: Part 3: Multivariate location and scatter."

The article advises that you use the DIAGNOSTICS and the LEVERAGE(MCDINFO) options on the MODEL statement in PROC ROBUSTREG. As you say, since high-leverage (influential) points are in the space of explanatory variables, the Y variable does not matter, so you can use a random variable. According to the ROBUSTREG documentation, you can control the cutoff by using the CUTOFFALPHA suboption like this:

LEVERAGE(CUTOFFALPHA=0.1 MCDINFO)

For the LOGISTIC model, I'm not sure what statistic you are trying to control. The influence diagnostics? I suggest you try changing the ALPHA= option on the PROC LOGISTIC statement. If that doesn't work, report back and we can think about it some more.

5 REPLIES 5

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

First, make sure you have read the last section of this article about PROC ROBUSTREG: "Detecting outliers in SAS: Part 3: Multivariate location and scatter."

The article advises that you use the DIAGNOSTICS and the LEVERAGE(MCDINFO) options on the MODEL statement in PROC ROBUSTREG. As you say, since high-leverage (influential) points are in the space of explanatory variables, the Y variable does not matter, so you can use a random variable. According to the ROBUSTREG documentation, you can control the cutoff by using the CUTOFFALPHA suboption like this:

LEVERAGE(CUTOFFALPHA=0.1 MCDINFO)

For the LOGISTIC model, I'm not sure what statistic you are trying to control. The influence diagnostics? I suggest you try changing the ALPHA= option on the PROC LOGISTIC statement. If that doesn't work, report back and we can think about it some more.

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

Thanks a lot for this fantastic and helpful article "Detecting outliers in SAS: Part 3: Multivariate location and scatter."

As it is mentioned in this article, y is defined random normal and high-leverage (influential) points are in the space of explanatory variables, the Y variable does not matter,

y=rannor(1);

I have 3 questions:

1. Can i use "y=ranuni(1); " instead of normal distribution ?

2. About QUANTILE=n, what is quantile? Is that the same quantile that we get from proc univariate, for example 75% Q3, those observation that values for independent variables are larger than 75%Q3 and not using MCD distance? I read the definition of quantile and alpha in sas 9.4 but it is not clear to me!

3. When i applied LEVERAGE(CUTOFFALPHA=0.1 MCDINFO),log gave me warning:

"WARNING: The behavior of the leverage CUTOFFALPHA option has

changed from previous releases. To revert to the

previous behavior, specify the same value for both the

CUTOFFALPHA and the MCDALPHA options."

Any help will be appreciated.

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

1. Yes, although RANNOR and RANUNI are deprecated, so start using RAND("Normal") or RAND("Uniform")

2. This is the critical value of the test statistic. If the squared Mahalanobis distance for an observation exceeds the critical value, you call it an outlier. Because the squared MD follows a distribution that is approximately chi-squared, you can use an extreme quantile of the chi-square distribution to set the critical value.

3. I believe the warning is telling you that long ago the CUTOFFALPHA= option was used for two purposes: leverage detection and the "final MCD reweighting step." Now there are two options that each control one thing. Since you are only interested in leverage detection, you can ignore the warning (or specify the MCDALPHA= option if you want the warning to go away).

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

Hello,

I really appreciate you to help me find answer to my questions. Another question i have is that, how to find only high extreme leverages or only low extreme leverages when we apply robust regression and MCD algorithm? Because leverage data contain extreme low and extreme high data points, is it correct way to find those leverages that have at least one value larger than for example 85 percentile (or what percentile is appropriate?), that way we can get leverages that are extreme high data points ? or is there another standard way to find that?

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

Yes, you have the correct understanding. By making the value CUTOFFALPHA= very small, the quantile will be big and only very extreme outliers will be "detected."

Remember that the definition of an outlier depends on the distribution of the data. A small value such as CUTOFFALPHA=0.002 will classify a point as an outlier if the robust Mahalanobis distance to the robust mean is much greater than would be expected for multivariate normal data with that estiamted mean/covariance.

Build your skills. Make connections. Enjoy creative freedom. Maybe change the world. **Registration is now open through August 30th**. Visit the SAS Hackathon homepage.

What is ANOVA?

ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.

Find more tutorials on the SAS Users YouTube channel.