turn on suggestions

Auto-suggest helps you quickly narrow down your search results by suggesting possible matches as you type.

Showing results for

Find a Community

- Home
- /
- SAS Programming
- /
- SAS Procedures
- /
- How to count outliers

Topic Options

- RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page

- Mark as New
- Bookmark
- Subscribe
- RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

11-19-2015 05:43 PM - edited 11-19-2015 06:08 PM

Hi All,

I have a program to genarate a data set from normal distribution with outliers.

I belive the following program genarate a data set with 5% of outliers.

%let N = 100;

data CN(keep=x); call streaminit(12345);

do i = 1 to &N; if rand("Normal", 100, 16) then

x = rand("Normal", 100, 4); else

x = rand("Normal"); output; end; run;(Ref: Simulation data with SAS, By Rick Wicklin).

I want to estimate the propotion of the outliers to make sure my data set genarate 5% of outliers.

If any body have any idea , please help me.

Accepted Solutions

Solution

11-20-2015
10:28 PM

- Mark as New
- Bookmark
- Subscribe
- RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Posted in reply to kamal1

11-20-2015 10:17 PM

I added a random number to the dataset (rnd) for the sole purpose of ordering outliers randomly within the dataset. The sort operation creates that random order. If you don't care about the position of your outliers in the dataset, you can simply do

```
%let N=100;
%let outlierPct=5;
data CN(keep=x);
call streaminit(12345);
outlierNb = round(&N.*&outlierPct./100);
do i = 1 to outlierNb;
x = rand("Normal",100,4);
output;
end;
do i = outlierNb+1 to &N.;
x = rand("Normal");
output;
end;
run;
```

PG

All Replies

- Mark as New
- Bookmark
- Subscribe
- RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Posted in reply to kamal1

11-19-2015 06:42 PM

In order to count 5% of outliers, you first have to calculate 97.5% and 2.5% quantile, x value which is greater than 97.5% quantile or less than 2.5% quantile is identified as outlier, then count frequence.

%let N = 100;

data CN(keep=x);

call streaminit(12345);

do i = 1 to &N;

if rand("Normal", 100, 16) then

x = rand("Normal", 100, 4); else

x = rand("Normal");

output;

end;

run;

proc univariate data=cn;

var x;

output out=outlier pctlpts=2.5 97.5 pctlpre=x pctlname=pct25 pct975;

run;

data want;

set cn;

if _n_=1 then set outlier;

if x>xpct975 or x<xpct25 then flag=1;

else flag=0;

run;

proc freq data=want;

table flag;

run;

- Mark as New
- Bookmark
- Subscribe
- RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Posted in reply to kamal1

11-19-2015 07:09 PM

It looks like you want to contaminate a sample from a standard normal distribution with a sample from a N(100, 4) distribution, which is reasonable and would make it fairly easy to detect the outliers. However, your IF condition is satisfied with probability 1, so that all x-values will come from the N(100, 4) distribution. Of course, if you defined outliers as @slchen suggests, you would find (four) outliers even in the set {1, 2, 3, ..., 99, 100} and you wouldn't need contamination.

- Mark as New
- Bookmark
- Subscribe
- RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Posted in reply to FreelanceReinhard

11-19-2015 09:41 PM

Thank you very much for both of your answers (slchen and rd).What I want to simulate data from normal distribution with outliers. It does not need N(100,4) but normal distribution with any mean and s.d. I am checking the effect of outliers for study. So I want to simulate the outliers with several pecentage like 5%, 10% 15% etc...I greately appriciate any suggestions.

- Mark as New
- Bookmark
- Subscribe
- RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Posted in reply to kamal1

11-20-2015 09:04 AM

The contaminated normal distribution is a specific two-component mixture distribution. The article "Generate a random sample from a mixture distribution" discusses simulting data from a mixture distribution in SAS. The example uses three components, but can be modified to produce a contaminated normal. See also Chapter 7 of Simulating Data with SAS.

- Mark as New
- Bookmark
- Subscribe
- RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Posted in reply to Rick_SAS

11-20-2015 09:47 PM

Thank you so much Rick...

- Mark as New
- Bookmark
- Subscribe
- RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Posted in reply to kamal1

11-19-2015 09:50 PM - edited 11-19-2015 09:54 PM

To get exactly the number of outliers that you specify, randomly interspersed in your data you could do something like this:

```
%let N=100;
%let outlierPct=5;
data CN(keep=x rnd);
call streaminit(12345);
outlierNb = round(&N.*&outlierPct./100);
do i = 1 to outlierNb;
x = rand("Normal",100,4);
rnd = rand("UNIFORM");
output;
end;
do i = outlierNb+1 to &N.;
x = rand("Normal");
rnd = rand("UNIFORM");
output;
end;
run;
proc sort data=CN out=CN(drop=rnd); by rnd; run;
```

PG

- Mark as New
- Bookmark
- Subscribe
- RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Posted in reply to PGStats

11-20-2015 09:46 PM

Thank you very much PG. I have a question. Why did you sort the data at the end? Thank you again.

Solution

11-20-2015
10:28 PM

- Mark as New
- Bookmark
- Subscribe
- RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Posted in reply to kamal1

11-20-2015 10:17 PM

I added a random number to the dataset (rnd) for the sole purpose of ordering outliers randomly within the dataset. The sort operation creates that random order. If you don't care about the position of your outliers in the dataset, you can simply do

```
%let N=100;
%let outlierPct=5;
data CN(keep=x);
call streaminit(12345);
outlierNb = round(&N.*&outlierPct./100);
do i = 1 to outlierNb;
x = rand("Normal",100,4);
output;
end;
do i = outlierNb+1 to &N.;
x = rand("Normal");
output;
end;
run;
```

PG

- Mark as New
- Bookmark
- Subscribe
- RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Posted in reply to PGStats

11-20-2015 10:18 PM

Great! Thank you very much for your quick reply and great help PG.

- Mark as New
- Bookmark
- Subscribe
- RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Posted in reply to PGStats

12-26-2015 02:21 AM

Hi,

I found this programe(genarate outliers) is very helpful. I would like to write this program in SAS IML. If you can write this in IML programe, it will greately appriciated. Thank you in advance.

- Mark as New
- Bookmark
- Subscribe
- RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Posted in reply to save

12-26-2015 04:44 PM

See my earlier comment in which I refer to the article "Generate a random sample from a mixture distribution."

```
proc iml;
call randseed(12345);
N = 100; /* sample size */
k = ceil(0.05*N); /* 5% of sample */
x = j(N, 1);
call randgen(x, "Normal", 0, 10); /* sample from N(0, 10) */
z = j(k, 1);
call randgen(z, "Normal", 0, 100); /* contamination from N(0, 100) */
idx = sample(1:N, k, "NoReplace"); /* k random elements */
x[idx] = z; /* overwrite with contaminated values */
```