Turn on suggestions

Auto-suggest helps you quickly narrow down your search results by suggesting possible matches as you type.

Showing results for

- Home
- /
- Analytics
- /
- Stat Procs
- /
- Data Simulation for Repeated Measures Design

Options

- RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Mute
- Printer Friendly Page

🔒 This topic is **solved** and **locked**.
Need further help from the community? Please
sign in and ask a **new** question.

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

Posted 02-10-2020 04:01 PM
(1823 views)

I am interested in conducting a data simulation that will help me understand the sample size for a study with a repeated measures design in which the subjects are measured at two time points. (I've been working with the book Simulating Data in SAS, but I'm not that familiar with PROC IML, so I have a learning curve ahead of me.)

The subjects are administered a survey in which the ordinal outcomes are highly right skewed, and I am trying to understand whether I have enough subjects to detect an effect of an intervention. In my code below, I'm simulating two years of data from a discrete distribution based on the distribution from 2019. I am looking to measure change in 2020 in a group of matched individuals. In the following code, the simulation builds two independent data sets from which I randomly sample (with replacement) to get a sample size of 10 subjects per year (this is the size of the group I'm studying; my full study will have many groups, but I'm trying to simulate a study with one level of group for now).

I don't think it's correct to just concatenate the data sets and assign the integers 1-10 in each year because it is probably more likely for an individual to move to an adjacent category and not three or four points away. How does a data simulation take this into account with a discrete distribution? Does anyone know of SAS white papers that get to this topic?

```
data TimeFirst (keep=Y Group Year);
call streaminit(4321);
array p[7] (0.6 0.35 0.01 0.01 0.01 0.01 0.01); /* probabilities for current distribution */
do i = 1 to 100000;
Y = rand("Table", of p[*]); ;
Group = 'Rural';
Year = '2019';
output;
end;
run;
data TimeSecond (keep=Y Group Year);
call streaminit(4321);
array p[7] (0.5 0.45 0.01 0.01 0.01 0.01 0.01); /* probabilities based on assumption of effect of intervention */
do i = 1 to 100000;
Y = rand("Table", of p[*]);
Group = 'Rural';
Year = '2020';
output;
end;
run;
data All;
set Time:;
run;
proc sort data=All;
by Group Year;
run;
proc surveyselect data=All out=All_sample method=urs n=10;
strata Group Year; /*I am sampling from my simulations*/
run;
/*The problem here is I need subjects to be matched on a variable like ID, and this makes a sample data set of independent observations. My goal is to be able to run an anlaysis of the form:
proc mixed data=all_sample;
class year ID;
model Y = year;
repeated Year / Subject = ID type = cs;
run;
*/
```

1 ACCEPTED SOLUTION

Accepted Solutions

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

You can make your 2020 dataset conditional on the 2019 data, this way:

```
data all;
call streaminit(4321);
array p[7,7] _temporary_ /* conditional probabilities for second year response */
(0.5 0.2 0.1 0.05 0.05 0.05 0.05
0.1 0.5 0.1 0.1 0.1 0.05 0.05
0.1 0.1 0.5 0.1 0.1 0.05 0.05
0.05 0.1 0.1 0.5 0.1 0.1 0.05
0.05 0.05 0.1 0.1 0.5 0.1 0.1
0.05 0.05 0.1 0.1 0.1 0.5 0.1
0.05 0.05 0.05 0.05 0.1 0.2 0.5
);
set timeFirst;
output;
Y = rand("Table", p[y,1], p[y,2], p[y,3], p[y,4], p[y,5], p[y,6], p[y,7]);
year = '2020';
output;
run;
```

PG

5 REPLIES 5

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

You can make your 2020 dataset conditional on the 2019 data, this way:

```
data all;
call streaminit(4321);
array p[7,7] _temporary_ /* conditional probabilities for second year response */
(0.5 0.2 0.1 0.05 0.05 0.05 0.05
0.1 0.5 0.1 0.1 0.1 0.05 0.05
0.1 0.1 0.5 0.1 0.1 0.05 0.05
0.05 0.1 0.1 0.5 0.1 0.1 0.05
0.05 0.05 0.1 0.1 0.5 0.1 0.1
0.05 0.05 0.1 0.1 0.1 0.5 0.1
0.05 0.05 0.05 0.05 0.1 0.2 0.5
);
set timeFirst;
output;
Y = rand("Table", p[y,1], p[y,2], p[y,3], p[y,4], p[y,5], p[y,6], p[y,7]);
year = '2020';
output;
run;
```

PG

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

I don't fully understand your design, but as PGStats says, you might want to generate both years at once. First, generate the 2019 value for a subject, then generate the 2020 value based on a random deviation (and treatment group effect?) from the 2019 value.

My advice is to write out the mixed model that you are trying to fit. You then will simulate from that model.

For an example of this kind of simulation study for power, see Psioda (2012) The paper is very good except for p. 9-10. You can also ignore the NumIterPer parameter in his study and just set it equal to the Iterations parameter. The added complexity isn't worth the time savings.

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

Are you ready for the spotlight? We're accepting content ideas for **SAS Innovate 2025** to be held May 6-9 in Orlando, FL. The call is **open **until September 25. Read more here about **why** you should contribute and **what is in it** for you!

What is ANOVA?

ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.

Find more tutorials on the SAS Users YouTube channel.