10 Tips for Simulating Data With SAS® Q&A, Slides, and On-Demand Recording

Started ‎09-28-2023 by
Modified ‎09-28-2023 by
Views 1,452

Watch this Ask the Expert session to learn 10 tips to write efficient simulations in SAS.

You will learn:

• How to simulate data from univariate distributions.
• How to use the simulation to estimate the sampling distribution of a sample statistic.
• Best practices to create efficient simulations and programming pitfalls to avoid.
• How to simulate data from a regression model.

The questions from the Q&A segment held at the end of the webinar are listed below and the slides from the webinar are attached.

Q&A

Why do I need simulation to estimate the standard error? Can’t I use a formula?

Yes, some statistics have formulas for the standard error. However, many of the formulas are based on an asymptotic analysis (which assumes large samples) or distributional assumptions (such as normality). For example, we know from the central limit theorem that the mean has a sampling distribution that is asymptotically normal. But if you have a small sample, you might want to use simulation to estimate the true sampling distribution for the sample mean.

Can I simulate a data set that includes outliers?

Yes. One option is to simulate data from a so-called “contaminated distribution," which is a mixture of two distributions. Let's say you want about 10% of the data to outliers. You can choose 90% of your observations to be from a normal distribution (such as N(0,1)) and 10% of your simulated values to be from a distribution that has a much wider standard deviation (such as N(0,100)). That mixture will be “mostly normal” but will contain some outliers.

Does SAS have a book on Monte Carlo Simulation?

I recommend Wicklin (2013), Simulating Data with SAS. It contains many applications of Monte Carlo simulation to statistical analyses.

Can we do Monte-Carlo based on a data set of 10 observations?

Yes. A typical Monte Carlo simulation requires that you specify the sample size and the distribution from which you are generating random samples. But you said, “based on,” which makes me wonder if you are talking about bootstrapping? Bootstrapping is closely related to simulation, except that you generate new samples from the observed data instead of generating samples from a probability distribution.

In your talk, you mostly use 90% CIs for the statistics. Can you get a 95% CI instead?

Sure. To get a 95% CI for a statistic, use the PCTLPTS= option in PROC UNIVARIATE:
proc univariate data=OutStats;
var SampleMean;
output out=SampDist mean=MCMean std=MCStdErr pctlpre=Pctl_pctlpts=2.5, 97.5;
run;

Does PROC SURVEYSELECT enable us to do the simulation part of the Monte Carlo simulation?

The SURVEYSELECT does not perform simulation. You can use PROC SURVEYSELECT to obtain random samples from a set of observed data.  This procedure is helpful for bootstrapping.

If I generated my simulation and then I use Proc Means to compute statistics by sample, should I use by statement or class statement? What is best?

Typically, the BY statement would be used. To use the BY statement, the simulated samples must be sorted by the values of the BY variable, which is usually the case for simulated data. The BY statement is more efficient than the CLASS statement.

How is Monte Carlo simulation different from bootstrap?

To perform a Monte Carlo simulation, you generate random samples from a probability distribution, where we assume that you know the parameter values of the distribution. Bootstrapping is resampling from an existing set of observational data, usually with replacement, to create multiple resamples of the existing data. PROC SURVEYSELECT is useful for this task. In short, simulation draws samples from a model distribution; bootstrap draws resamples from the empirical distribution of the data.

Do you have a webinar or resources for Bayesian method/MCMC simulation in SAS?

MCMC is a big topic. Perhaps start with this introductory webinar. SAS has a training course on Bayesian Analysis that focuses on MCMC. A good introduction to PROC MCMC is Gunes and Chen (2014).

How would simulation work for survey responses given that each individual is unique?

A simulation for survey responses works exactly the same as for other analyses. You assume or fit a model for the responses, taking into account demographic strata such as race, gender, and political affiliation. You first generate an individual based on sampling weights. For each simulated individual, you can use a probability-based model to simulate a response.

Recommended Resources

Wicklin (2015) “Ten Tips for Simulating Data with SAS”

Wicklin (2013) Simulating Data with SAS

Wicklin (2012) “Simulation in SAS: The slow way or the BY way”

Moving from SAS®9 to SAS® Viya®

Move to Viya Board

Please see additional resources in the attached slide deck.

Want more tips? Be sure to subscribe to the Ask the Expert board to receive follow up Q&A, slides and recordings from other SAS Ask the Expert webinars.

Version history
Last update:
‎09-28-2023 02:41 PM
Updated by:
Contributors
Article Labels
Article Tags
Keep connecting with SAS experts