Auto-suggest helps you quickly narrow down your search results by suggesting possible matches as you type.

Showing results for

Find a Community

- Home
- /
- SAS Programming
- /
- SAS Procedures
- /
- Outliers in simulation

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

11-10-2011 01:56 PM

Hi everyone I need to know how can I determine the number of outliers in my simulation? Can anyone help?

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

11-10-2011 02:50 PM

You need to define an outlier (0/1) then add that up.

Defining the outlier is the problem, is it something outside the 99.9% CI?

It really depends on what you're looking at and your modelling criteria. How many paramters are you looking at, do you have a single outcome or multiple outcomes.

What's an outlier also depends on business context, for machinery it might be 99% but for medical data could be 95%...

We need more details on what your simulating and how to help out.

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

11-10-2011 02:57 PM

What I need is generating the independent variables in a regression relationship and I need the generated independent variables to contain outliers.

By outliers I only mean values that are far away from the set of data generated (either outliers up or down)and not according to certain criteria and not something related to CI . and it is not for a business context it is just for applying .Thanks for your effort

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

11-10-2011 03:45 PM

Ok...same idea then.

Take each independent variable that was generated and flag if its an outlier or not.

AFAIK there really isn't an absolute statistical definition of what is an outlier, so you'll need to come up with that.

There's some suggested methods on Wikipedia

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

11-12-2011 02:32 PM

One way to do this is to use the idea of a "contaminated normal distribution," which is a specific kind of mixture distribution.

After you define the x variable simulate the y variable as follows:

type = rand("Bernoulli", 0.1); /* outlier with 10% probability */

if type=1 then

error = rand("Normal", 0, 10); /* error is N(0, 10) */

else

error = rand("Normal", 0, 1); /* error is N(0, 10) */

y = intercept + beta*x + error;

outlier = (abs(error)>3);

Change the probability of contamination (0.1), the magnitude of the contamination (10) and the definition of an outlier (3) as your needs require.

For more info on the general case of sampling from a mixture distribution, see http://blogs.sas.com/content/iml/2011/09/21/generate-a-random-sample-from-a-mixture-distribution/

Rick

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

11-25-2011 04:21 PM

Thanks a lot for your effort but I still have problem in this part

If I need the outliers in the independent variables x's I would follow the same procedure? and how to determine the correlation between the produced x's if I produced each x separetly?

The other problem I have is that I am using NLPCG model and I determined the first row in the blc matrix as zeros as I need my decision variable to be positive but still the produced variables have negative values how can I solve this problem?

and I have another model that is linear in both objective function and constraint what is the suitable Call ?

Thanks in advance

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

12-01-2011 02:09 PM

If you want correlated data, generate your X's from some multivariate distribution with the given correlation structure. Then add outliers (from the same distribution but with an larger variance?)

I don't understand how you are using NLPCG. You haven't said what you are optimizing. Nevertheless, I don't see how you can get negative values if you specify the blc matrix correctly. Make sure that your initial guess is valid.

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

12-02-2011 06:33 PM

- The constraints in NLPCG are put in matrix form with the first row representing the lower limit so I put the matrix as

con={**0.** **0.** **0.** **0.** **0.** **0.** **0.** **0.** **0.** **0.** **.** **.** ,

**.** **.** **.** **.** **.** **.** **.** **.** **.** **.** **.** **.** ,

**40.** **51.** **60.** **24.** **53.** **80.** **16.** **34.** **52.** **84.** **0.** **42.894**,

**1.** **1.** **1.** **1.** **1.** **1.** **1.** **1.** **1.** **1.** **0.** **1.** };

put still the resulted variables have negative values as -7.05E-18 so how can I solve this problem.

- The other thing is that when I generated xs by this way and added to them the outliers(generated as from same distribution with larger value) the the new variables donot have the same correlation determined in the begining so how can I solve this problem?

- I also need to know how to make a condition so that:

if correlation between y and x1 greater than or equal 0.5 x1 belongs to matrix H

if correlation between y and x1 less than 0.5 x1 belongs to matrix K

Thanks