Solved: Data Simulation for Repeated Measures Design

svh · Posted 02-10-2020 04:01 PM

I am interested in conducting a data simulation that will help me understand the sample size for a study with a repeated measures design in which the subjects are measured at two time points. (I've been working with the book Simulating Data in SAS, but I'm not that familiar with PROC IML, so I have a learning curve ahead of me.)

The subjects are administered a survey in which the ordinal outcomes are highly right skewed, and I am trying to understand whether I have enough subjects to detect an effect of an intervention. In my code below, I'm simulating two years of data from a discrete distribution based on the distribution from 2019. I am looking to measure change in 2020 in a group of matched individuals. In the following code, the simulation builds two independent data sets from which I randomly sample (with replacement) to get a sample size of 10 subjects per year (this is the size of the group I'm studying; my full study will have many groups, but I'm trying to simulate a study with one level of group for now).

I don't think it's correct to just concatenate the data sets and assign the integers 1-10 in each year because it is probably more likely for an individual to move to an adjacent category and not three or four points away. How does a data simulation take this into account with a discrete distribution? Does anyone know of SAS white papers that get to this topic?

data TimeFirst (keep=Y Group Year);
call streaminit(4321);
array p[7] (0.6 0.35 0.01 0.01 0.01 0.01 0.01); /* probabilities for current distribution */
do i = 1 to 100000;
Y = rand("Table", of p[*]); ;
Group = 'Rural';
Year = '2019';
output;
end;
run;

data TimeSecond (keep=Y Group Year);
call streaminit(4321);
array p[7] (0.5 0.45 0.01 0.01 0.01 0.01 0.01); /* probabilities based on assumption of effect of intervention */
do i = 1 to 100000;
Y = rand("Table", of p[*]); 
Group = 'Rural';
Year = '2020';
output;
end;
run;

data All;
   set Time:;
   run;
proc sort data=All;
   by Group Year;
   run;
proc surveyselect data=All out=All_sample method=urs n=10;
   strata Group Year; /*I am sampling from my simulations*/
   run;
/*The problem here is I need subjects to be matched on a variable like ID, and this makes a sample data set of independent observations.  My goal is to be able to run an anlaysis of the form:
proc mixed data=all_sample;
   class year ID;
   model Y = year;
   repeated  Year / Subject = ID type = cs;
   run;
   
*/

PGStats · Posted 02-10-2020 11:53 PM

You can make your 2020 dataset conditional on the 2019 data, this way:


data all;
call streaminit(4321);
array p[7,7]  _temporary_ /* conditional probabilities for second year response */
	(0.5  0.2  0.1  0.05 0.05 0.05 0.05
	 0.1  0.5  0.1  0.1  0.1  0.05 0.05
	 0.1  0.1  0.5  0.1  0.1  0.05 0.05
	 0.05 0.1  0.1  0.5  0.1  0.1  0.05
	 0.05 0.05 0.1  0.1  0.5  0.1  0.1
	 0.05 0.05 0.1  0.1  0.1  0.5  0.1
	 0.05 0.05 0.05 0.05 0.1  0.2  0.5
	 );
set timeFirst;
output;
Y = rand("Table", p[y,1], p[y,2], p[y,3], p[y,4], p[y,5], p[y,6], p[y,7]);
year = '2020';
output;
run;

PG

View solution in original post

PGStats · Posted 02-10-2020 11:53 PM

You can make your 2020 dataset conditional on the 2019 data, this way:


data all;
call streaminit(4321);
array p[7,7]  _temporary_ /* conditional probabilities for second year response */
	(0.5  0.2  0.1  0.05 0.05 0.05 0.05
	 0.1  0.5  0.1  0.1  0.1  0.05 0.05
	 0.1  0.1  0.5  0.1  0.1  0.05 0.05
	 0.05 0.1  0.1  0.5  0.1  0.1  0.05
	 0.05 0.05 0.1  0.1  0.5  0.1  0.1
	 0.05 0.05 0.1  0.1  0.1  0.5  0.1
	 0.05 0.05 0.05 0.05 0.1  0.2  0.5
	 );
set timeFirst;
output;
Y = rand("Table", p[y,1], p[y,2], p[y,3], p[y,4], p[y,5], p[y,6], p[y,7]);
year = '2020';
output;
run;

PG

svh · Posted 02-12-2020 01:01 PM

I think I see what's happening here--the matrix is a set of conditional probabilities, which I would alter based on what I think the change could be at the next time point.

Rick_SAS · Posted 02-11-2020 02:05 PM

I don't fully understand your design, but as PGStats says, you might want to generate both years at once. First, generate the 2019 value for a subject, then generate the 2020 value based on a random deviation (and treatment group effect?) from the 2019 value.

My advice is to write out the mixed model that you are trying to fit. You then will simulate from that model.

For an example of this kind of simulation study for power, see Psioda (2012) The paper is very good except for p. 9-10. You can also ignore the NumIterPer parameter in his study and just set it equal to the Iterations parameter. The added complexity isn't worth the time savings.

svh · Posted 02-12-2020 01:04 PM

In my design, I actually am testing for change in an outcome variable over time (the outcome is the perception of mistreatment in the educational setting). However, I have a random effect because participants are clustered in educational programs. I'm first trying to wrap my mind around how to conduct the simulation with one group--the reality is that mistreatment can vary across programs due to various reasons, so I will need to scale up to simulating the random effect of program.

Rick_SAS · Posted 02-13-2020 08:45 AM

For a random effect, sample the effect from N(0, sigma) once for each cluster. That value is used for all measurements within the cluster.

Data Simulation for Repeated Measures Design

Re: Data Simulation for Repeated Measures Design

Re: Data Simulation for Repeated Measures Design

Re: Data Simulation for Repeated Measures Design

Re: Data Simulation for Repeated Measures Design

Re: Data Simulation for Repeated Measures Design

Re: Data Simulation for Repeated Measures Design

Data Simulation for Repeated Measures Design

Re: Data Simulation for Repeated Measures Design

Re: Data Simulation for Repeated Measures Design

Re: Data Simulation for Repeated Measures Design

Re: Data Simulation for Repeated Measures Design

Re: Data Simulation for Repeated Measures Design

Re: Data Simulation for Repeated Measures Design

Registration is open