BookmarkSubscribeRSS Feed
deleted_user
Not applicable
Denote Message was edited by: ljw5122
1 REPLY 1
Rick_SAS
SAS Super FREQ
I had a homework problem like this once...

You asked for an understanding of the contaminated normal distribution. The contaminated normal is often used in testing the robustness of statistics, and I think the most natural way to think about it is to think about sampling from the pdf.

Let's suppose you want to compare the behavior of the standard deviation versus the interquartile range in the presence of outliers. One way to do this is to simulate, say, 1000 univariate values, with roughly 950 being from N(0,1) and 50 being from N(0, 100). This is an example of a contaminated normal with 5% contamination. (It's also an example of a mixture distribution.) The PDF will look somewhat like a normal distribution, except that the tails will be fatter.

How would I generate a sample in IML? Well, I'd generate n from a binomial distribution with 1000 trials and probability p of being an outlier.
Then I'd sample the data from two distibutions:
x1 = j(1000-n, 1);
call randgen(x1, "normal", 0, 1); /** most of the data **/
x2 = j(n, 1);
call randgen(x2, "normal", 0, 100); /** outliers **/

The vector x1//x2 contains data sampled from the contaminated normal pdf.

To geneate the pdf (or cdf) directly is a simple one-liner that uses the PDF (or CDF) function in Base SAS.

hackathon24-white-horiz.png

The 2025 SAS Hackathon Kicks Off on June 11!

Watch the live Hackathon Kickoff to get all the essential information about the SAS Hackathon—including how to join, how to participate, and expert tips for success.

YouTube LinkedIn

From The DO Loop
Want more? Visit our blog for more articles like these.
Discussion stats
  • 1 reply
  • 1415 views
  • 0 likes
  • 2 in conversation