BookmarkSubscribeRSS Feed
heba2000
Calcite | Level 5

Hi everyone I need to know how can I determine the number of outliers in my simulation? Can anyone help?

7 REPLIES 7
Reeza
Super User

You need to define an outlier (0/1) then add that up.

Defining the outlier is the problem, is it something outside the 99.9% CI?

It really depends on what you're looking at and your modelling criteria. How many paramters are you looking at, do you have a single outcome or multiple outcomes.

What's an outlier also depends on business context, for machinery it might be 99% but for medical data could be 95%...

We need more details on what your simulating and how to help out.

heba2000
Calcite | Level 5

What I need is generating the independent variables in a regression relationship and I need the generated independent variables to contain outliers.

By outliers I only mean values that are far away from the set of data generated (either outliers up or down)and not according to certain criteria and not something related to CI . and it is not  for a business context it is just for applying .Thanks for your effort

Reeza
Super User

Ok...same idea then.

Take each independent variable that was generated and flag if its an outlier or not.

AFAIK there really isn't an absolute statistical definition of what is an outlier, so you'll need to come up with that.

There's some suggested methods on Wikipedia

http://en.wikipedia.org/wiki/Outlier

Rick_SAS
SAS Super FREQ

One way to do this is to use the idea of a "contaminated normal distribution," which is a specific kind of mixture distribution.

After you define the x variable simulate the y variable as follows:

type = rand("Bernoulli", 0.1); /* outlier with 10% probability */

if type=1 then

     error = rand("Normal", 0, 10); /* error is N(0, 10) */

else

     error = rand("Normal", 0, 1); /* error is N(0, 10) */

y = intercept + beta*x + error;

outlier = (abs(error)>3);

Change the probability of contamination (0.1), the magnitude of the contamination (10) and the definition of an outlier (3) as your needs require.

For more info on the general case of sampling from a mixture distribution, see http://blogs.sas.com/content/iml/2011/09/21/generate-a-random-sample-from-a-mixture-distribution/

Rick

heba2000
Calcite | Level 5

Thanks a lot for your effort but I still have problem in this part

If I need the outliers in the independent variables x's I would follow the same procedure? and how to determine the correlation between the produced x's if I produced each x separetly?

The other problem I have is that I am using NLPCG model and I determined the first row in the blc matrix as zeros as I need my decision variable to be positive but still the produced variables have negative values how can I solve this problem?

and I have another model that is linear in both objective function and constraint what is the suitable Call ?

Thanks in advance

Rick_SAS
SAS Super FREQ

If you want correlated data, generate your X's from some multivariate distribution with the given correlation structure. Then add outliers (from the same distribution but with an larger variance?)

I don't understand how you are using NLPCG. You haven't said what you are optimizing. Nevertheless, I don't see how you can get negative values if you specify the blc matrix correctly.  Make sure that your initial guess is valid.

heba2000
Calcite | Level 5
  • The constraints in NLPCG are put in matrix form with the first row representing the lower limit so I put the matrix as

con={0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  .  . ,

     .  .  .  .  .  .  .  .  .  .  .  . ,

     40. 51. 60. 24. 53. 80. 16. 34. 52. 84. 0. 42.894,

     1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  0. 1.  };

put still the resulted variables have negative values as -7.05E-18 so how can I solve this problem.

  • The other thing is that when I generated xs by this way and added to them the outliers(generated as from same distribution with larger value) the the new variables donot have the same correlation determined in the begining so how can I solve this problem?

  • I also need to know how to make a condition so that:

if correlation between y and x1 greater than or equal 0.5 x1 belongs to matrix H

if correlation between y and x1 less than 0.5 x1 belongs to matrix K

Thanks

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

What is Bayesian Analysis?

Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 7 replies
  • 1490 views
  • 0 likes
  • 3 in conversation