Hi everyone I need to know how can I determine the number of outliers in my simulation? Can anyone help?
You need to define an outlier (0/1) then add that up.
Defining the outlier is the problem, is it something outside the 99.9% CI?
It really depends on what you're looking at and your modelling criteria. How many paramters are you looking at, do you have a single outcome or multiple outcomes.
What's an outlier also depends on business context, for machinery it might be 99% but for medical data could be 95%...
We need more details on what your simulating and how to help out.
What I need is generating the independent variables in a regression relationship and I need the generated independent variables to contain outliers.
By outliers I only mean values that are far away from the set of data generated (either outliers up or down)and not according to certain criteria and not something related to CI . and it is not for a business context it is just for applying .Thanks for your effort
Ok...same idea then.
Take each independent variable that was generated and flag if its an outlier or not.
AFAIK there really isn't an absolute statistical definition of what is an outlier, so you'll need to come up with that.
There's some suggested methods on Wikipedia
One way to do this is to use the idea of a "contaminated normal distribution," which is a specific kind of mixture distribution.
After you define the x variable simulate the y variable as follows:
type = rand("Bernoulli", 0.1); /* outlier with 10% probability */
if type=1 then
error = rand("Normal", 0, 10); /* error is N(0, 10) */
else
error = rand("Normal", 0, 1); /* error is N(0, 10) */
y = intercept + beta*x + error;
outlier = (abs(error)>3);
Change the probability of contamination (0.1), the magnitude of the contamination (10) and the definition of an outlier (3) as your needs require.
For more info on the general case of sampling from a mixture distribution, see http://blogs.sas.com/content/iml/2011/09/21/generate-a-random-sample-from-a-mixture-distribution/
Rick
Thanks a lot for your effort but I still have problem in this part
If I need the outliers in the independent variables x's I would follow the same procedure? and how to determine the correlation between the produced x's if I produced each x separetly?
The other problem I have is that I am using NLPCG model and I determined the first row in the blc matrix as zeros as I need my decision variable to be positive but still the produced variables have negative values how can I solve this problem?
and I have another model that is linear in both objective function and constraint what is the suitable Call ?
Thanks in advance
If you want correlated data, generate your X's from some multivariate distribution with the given correlation structure. Then add outliers (from the same distribution but with an larger variance?)
I don't understand how you are using NLPCG. You haven't said what you are optimizing. Nevertheless, I don't see how you can get negative values if you specify the blc matrix correctly. Make sure that your initial guess is valid.
con={0. 0. 0. 0. 0. 0. 0. 0. 0. 0. . . ,
. . . . . . . . . . . . ,
40. 51. 60. 24. 53. 80. 16. 34. 52. 84. 0. 42.894,
1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 0. 1. };
put still the resulted variables have negative values as -7.05E-18 so how can I solve this problem.
if correlation between y and x1 greater than or equal 0.5 x1 belongs to matrix H
if correlation between y and x1 less than 0.5 x1 belongs to matrix K
Thanks
Don't miss out on SAS Innovate - Register now for the FREE Livestream!
Can't make it to Vegas? No problem! Watch our general sessions LIVE or on-demand starting April 17th. Hear from SAS execs, best-selling author Adam Grant, Hot Ones host Sean Evans, top tech journalist Kara Swisher, AI expert Cassie Kozyrkov, and the mind-blowing dance crew iLuminate! Plus, get access to over 20 breakout sessions.
Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.
Find more tutorials on the SAS Users YouTube channel.