You asked for an understanding of the contaminated normal distribution. The contaminated normal is often used in testing the robustness of statistics, and I think the most natural way to think about it is to think about sampling from the pdf.
Let's suppose you want to compare the behavior of the standard deviation versus the interquartile range in the presence of outliers. One way to do this is to simulate, say, 1000 univariate values, with roughly 950 being from N(0,1) and 50 being from N(0, 100). This is an example of a contaminated normal with 5% contamination. (It's also an example of a mixture distribution.) The PDF will look somewhat like a normal distribution, except that the tails will be fatter.
How would I generate a sample in IML? Well, I'd generate n from a binomial distribution with 1000 trials and probability p of being an outlier.
Then I'd sample the data from two distibutions:
x1 = j(1000-n, 1);
call randgen(x1, "normal", 0, 1); /** most of the data **/
x2 = j(n, 1);
call randgen(x2, "normal", 0, 100); /** outliers **/
The vector x1//x2 contains data sampled from the contaminated normal pdf.
To geneate the pdf (or cdf) directly is a simple one-liner that uses the PDF (or CDF) function in Base SAS.
Registration is open! SAS is returning to Vegas for an AI and analytics experience like no other! Whether you're an executive, manager, end user or SAS partner, SAS Innovate is designed for everyone on your team. Register for just $495 by 12/31/2023.
If you are interested in speaking, there is still time to submit a session idea. More details are posted on the website.