Auto-suggest helps you quickly narrow down your search results by suggesting possible matches as you type.

Showing results for

- Home
- /
- Programming
- /
- SAS Procedures
- /
- How to count outliers

- RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Mute
- Printer Friendly Page

🔒 This topic is **solved** and **locked**.
Need further help from the community? Please
sign in and ask a **new** question.

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

Posted 11-19-2015 05:43 PM
(2146 views)

Hi All,

I have a program to genarate a data set from normal distribution with outliers.

I belive the following program genarate a data set with 5% of outliers.

%let N = 100;

data CN(keep=x); call streaminit(12345);

do i = 1 to &N; if rand("Normal", 100, 16) then

x = rand("Normal", 100, 4); else

x = rand("Normal"); output; end; run;(Ref: Simulation data with SAS, By Rick Wicklin).

I want to estimate the propotion of the outliers to make sure my data set genarate 5% of outliers.

If any body have any idea , please help me.

1 ACCEPTED SOLUTION

Accepted Solutions

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

I added a random number to the dataset (rnd) for the sole purpose of ordering outliers randomly within the dataset. The sort operation creates that random order. If you don't care about the position of your outliers in the dataset, you can simply do

```
%let N=100;
%let outlierPct=5;
data CN(keep=x);
call streaminit(12345);
outlierNb = round(&N.*&outlierPct./100);
do i = 1 to outlierNb;
x = rand("Normal",100,4);
output;
end;
do i = outlierNb+1 to &N.;
x = rand("Normal");
output;
end;
run;
```

PG

11 REPLIES 11

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

In order to count 5% of outliers, you first have to calculate 97.5% and 2.5% quantile, x value which is greater than 97.5% quantile or less than 2.5% quantile is identified as outlier, then count frequence.

%let N = 100;

data CN(keep=x);

call streaminit(12345);

do i = 1 to &N;

if rand("Normal", 100, 16) then

x = rand("Normal", 100, 4); else

x = rand("Normal");

output;

end;

run;

proc univariate data=cn;

var x;

output out=outlier pctlpts=2.5 97.5 pctlpre=x pctlname=pct25 pct975;

run;

data want;

set cn;

if _n_=1 then set outlier;

if x>xpct975 or x<xpct25 then flag=1;

else flag=0;

run;

proc freq data=want;

table flag;

run;

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

Thank you so much Rick...

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

To get exactly the number of outliers that you specify, randomly interspersed in your data you could do something like this:

```
%let N=100;
%let outlierPct=5;
data CN(keep=x rnd);
call streaminit(12345);
outlierNb = round(&N.*&outlierPct./100);
do i = 1 to outlierNb;
x = rand("Normal",100,4);
rnd = rand("UNIFORM");
output;
end;
do i = outlierNb+1 to &N.;
x = rand("Normal");
rnd = rand("UNIFORM");
output;
end;
run;
proc sort data=CN out=CN(drop=rnd); by rnd; run;
```

PG

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

Thank you very much PG. I have a question. Why did you sort the data at the end? Thank you again.

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

I added a random number to the dataset (rnd) for the sole purpose of ordering outliers randomly within the dataset. The sort operation creates that random order. If you don't care about the position of your outliers in the dataset, you can simply do

```
%let N=100;
%let outlierPct=5;
data CN(keep=x);
call streaminit(12345);
outlierNb = round(&N.*&outlierPct./100);
do i = 1 to outlierNb;
x = rand("Normal",100,4);
output;
end;
do i = outlierNb+1 to &N.;
x = rand("Normal");
output;
end;
run;
```

PG

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

Great! Thank you very much for your quick reply and great help PG.

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

Hi,

I found this programe(genarate outliers) is very helpful. I would like to write this program in SAS IML. If you can write this in IML programe, it will greately appriciated. Thank you in advance.

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

See my earlier comment in which I refer to the article "Generate a random sample from a mixture distribution."

```
proc iml;
call randseed(12345);
N = 100; /* sample size */
k = ceil(0.05*N); /* 5% of sample */
x = j(N, 1);
call randgen(x, "Normal", 0, 10); /* sample from N(0, 10) */
z = j(k, 1);
call randgen(z, "Normal", 0, 100); /* contamination from N(0, 100) */
idx = sample(1:N, k, "NoReplace"); /* k random elements */
x[idx] = z; /* overwrite with contaminated values */
```

Registration is open! SAS is returning to Vegas for an AI and analytics experience like no other! Whether you're an executive, manager, end user or SAS partner, SAS Innovate is designed for everyone on your team. Register for just $495 by 12/31/2023.

**If you are interested in speaking, there is still time to submit a session idea. More details are posted on the website. **

What is Bayesian Analysis?

Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.

Find more tutorials on the SAS Users YouTube channel.