BookmarkSubscribeRSS Feed
deleted_user
Not applicable
Denote Message was edited by: ljw5122
1 REPLY 1
Rick_SAS
SAS Super FREQ
I had a homework problem like this once...

You asked for an understanding of the contaminated normal distribution. The contaminated normal is often used in testing the robustness of statistics, and I think the most natural way to think about it is to think about sampling from the pdf.

Let's suppose you want to compare the behavior of the standard deviation versus the interquartile range in the presence of outliers. One way to do this is to simulate, say, 1000 univariate values, with roughly 950 being from N(0,1) and 50 being from N(0, 100). This is an example of a contaminated normal with 5% contamination. (It's also an example of a mixture distribution.) The PDF will look somewhat like a normal distribution, except that the tails will be fatter.

How would I generate a sample in IML? Well, I'd generate n from a binomial distribution with 1000 trials and probability p of being an outlier.
Then I'd sample the data from two distibutions:
x1 = j(1000-n, 1);
call randgen(x1, "normal", 0, 1); /** most of the data **/
x2 = j(n, 1);
call randgen(x2, "normal", 0, 100); /** outliers **/

The vector x1//x2 contains data sampled from the contaminated normal pdf.

To geneate the pdf (or cdf) directly is a simple one-liner that uses the PDF (or CDF) function in Base SAS.

SAS Innovate 2025: Register Now

Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!

Multiple Linear Regression in SAS

Learn how to run multiple linear regression models with and without interactions, presented by SAS user Alex Chaplin.

Find more tutorials on the SAS Users YouTube channel.

From The DO Loop
Want more? Visit our blog for more articles like these.
Discussion stats
  • 1 reply
  • 1128 views
  • 0 likes
  • 2 in conversation