Turn on suggestions

Auto-suggest helps you quickly narrow down your search results by suggesting possible matches as you type.

Showing results for

- Home
- /
- Analytics
- /
- Stat Procs
- /
- Simulating data using data step to randomly generate continuous data w...

Options

- RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Mute
- Printer Friendly Page

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

Posted 09-29-2020 11:07 AM
(1152 views)

Hello,

I am using the rand function to simulate a data set to use for an analysis. I am using moments from a pre-existing data set to guide the simulation (I am not able to use the original data so proc surveyselect is not an option for me), but I am having difficulty simulating the follow-up time to match the outcome (binary event) in each treatment group because the distributions are non-standard. I read that it's possible to use a macro to simulate data specifying the distribution parameters needed, but I am having problems putting the SAS code together successfully. I would appreciate an points in the right direction.

Thank you.

8 REPLIES 8

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

I don't have an answer to your question, but I moved it to the statistics forum.

In general for anything simulation related, @Rick_SAS is the guru and his book is a great reference.

Simulating data with SAS

https://www.amazon.ca/Simulating-Data-SAS-Rick-Wicklin/dp/1612903320

@MichelleR0 wrote:

Hello,

I am using the rand function to simulate a data set to use for an analysis. I am using moments from a pre-existing data set to guide the simulation (I am not able to use the original data so proc surveyselect is not an option for me), but I am having difficulty simulating the follow-up time to match the outcome (binary event) in each treatment group because the distributions are non-standard. I read that it's possible to use a macro to simulate data specifying the distribution parameters needed, but I am having problems putting the SAS code together successfully. I would appreciate an points in the right direction.

Thank you.

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

Rick has written **many** articles on this topic, which you can find on his blog here. Free!

Learn from the Experts! Check out the huge catalog of free sessions in the Ask the Expert webinar series.

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

Can you provide more details, or show some data and code? Are you simulating univariate data or multivariate data?

When you are simulating from non-normal data, you can fit the data to a flexible distribution with several parameters, then simulate from that distribution. If you provide more details about your problem, we can probably steer you in the right direction.

It will also be helpful to know what products you have access to. Do you have SAS/IML or SAS/ETS?

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

Thank you for your questions.

It is multivariate data (combination of binary, categorical, and continuous baseline variables and time to follow-up data (# of days until event) for incidence of outcome defined both as binary and cumulative).

Below are some examples of the codes I used to simulate the baseline variables (replaced the variables with "var" - I am at a loss for what to do to simulate the number of days to event - the distributions are different by event status and treatment arm. Please let me know what additional information I could provide that would be helpful

%Let N = 1;

%Let mu_var = 72.8;

%Let sigma_var = 10.5;

%Let norm_var = rand('normal',&mu_var,&sigma_var);

do x=1 to &N;

array a [6] (0.02 0.146 0.066 0.0033 0.02 0.745);

varnew=&norm_var;

If rand("uniform") < 0.666 then var = 0; else var = 1;

var = rand("Table", of a[*]);

I am using SAS 9.4. I am not familiar with SAS/IML or SAS/ETS, but if you think these platforms are necessary, I can inquire if my institute has access to them.

Thank you.

It is multivariate data (combination of binary, categorical, and continuous baseline variables and time to follow-up data (# of days until event) for incidence of outcome defined both as binary and cumulative).

Below are some examples of the codes I used to simulate the baseline variables (replaced the variables with "var" - I am at a loss for what to do to simulate the number of days to event - the distributions are different by event status and treatment arm. Please let me know what additional information I could provide that would be helpful

%Let N = 1;

%Let mu_var = 72.8;

%Let sigma_var = 10.5;

%Let norm_var = rand('normal',&mu_var,&sigma_var);

do x=1 to &N;

array a [6] (0.02 0.146 0.066 0.0033 0.02 0.745);

varnew=&norm_var;

If rand("uniform") < 0.666 then var = 0; else var = 1;

var = rand("Table", of a[*]);

I am using SAS 9.4. I am not familiar with SAS/IML or SAS/ETS, but if you think these platforms are necessary, I can inquire if my institute has access to them.

Thank you.

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

I have to say I wonder about this bit of code:

If rand("uniform") < 0.666 then var = 0; else var = 1; var = rand("Table", of a[*]);

You conditionally set var to 0/1. Then immediately set it to some integer in the range [1,6] where 0 is no longer possible. So why bother with the rand("uniform") test at all.

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

Thank you for your attention. I replaced the real variable names with 'var' for example only. So the first line is to generate a binary variable and the second is to generate a categorical variable with 6 levels. I hope that makes sense?

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

The way to proceed is this:

1. Simulate the explanatory variables.

2. Simulate the response variable as a STATISTICAL MODEL of the explanatory variables. I haven't blogged about survival models, although they are discussed in Chapter 12 of my book (p.242-247). To get an idea about what it means to simulate a response from a model, start with the more familiar linear regression models:

- Linear regression models with categorical covariates
- Logistic regression models
- Generalized linear models with link functions

Although I haven't blogged about survival models, you can see examples on the web or you can read Gibbs and Kiernan (2020), "Simulating Data for Complex Linear Models." The survival example is p 17-18.

**SAS Innovate 2025** is scheduled for May 6-9 in Orlando, FL. Sign up to be **first to learn** about the agenda and registration!

What is ANOVA?

ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.

Find more tutorials on the SAS Users YouTube channel.