topic Re: Data Simulation for Repeated Measures Design in Statistical Procedures

Data Simulation for Repeated Measures Design

svh — Mon, 10 Feb 2020 21:01:31 GMT

I am interested in conducting a data simulation that will help me understand the sample size for a study with a repeated measures design in which the subjects are measured at two time points. (I've been working with the book Simulating Data in SAS, but I'm not that familiar with PROC IML, so I have a learning curve ahead of me.)

The subjects are administered a survey in which the ordinal outcomes are highly right skewed, and I am trying to understand whether I have enough subjects to detect an effect of an intervention. In my code below, I'm simulating two years of data from a discrete distribution based on the distribution from 2019. I am looking to measure change in 2020 in a group of matched individuals. In the following code, the simulation builds two independent data sets from which I randomly sample (with replacement) to get a sample size of 10 subjects per year (this is the size of the group I'm studying; my full study will have many groups, but I'm trying to simulate a study with one level of group for now).

I don't think it's correct to just concatenate the data sets and assign the integers 1-10 in each year because it is probably more likely for an individual to move to an adjacent category and not three or four points away. How does a data simulation take this into account with a discrete distribution? Does anyone know of SAS white papers that get to this topic?

data TimeFirst (keep=Y Group Year);
call streaminit(4321);
array p[7] (0.6 0.35 0.01 0.01 0.01 0.01 0.01); /* probabilities for current distribution */
do i = 1 to 100000;
Y = rand("Table", of p[*]); ;
Group = 'Rural';
Year = '2019';
output;
end;
run;

data TimeSecond (keep=Y Group Year);
call streaminit(4321);
array p[7] (0.5 0.45 0.01 0.01 0.01 0.01 0.01); /* probabilities based on assumption of effect of intervention */
do i = 1 to 100000;
Y = rand("Table", of p[*]); 
Group = 'Rural';
Year = '2020';
output;
end;
run;

data All;
   set Time:;
   run;
proc sort data=All;
   by Group Year;
   run;
proc surveyselect data=All out=All_sample method=urs n=10;
   strata Group Year; /*I am sampling from my simulations*/
   run;
/*The problem here is I need subjects to be matched on a variable like ID, and this makes a sample data set of independent observations.  My goal is to be able to run an anlaysis of the form:
proc mixed data=all_sample;
   class year ID;
   model Y = year;
   repeated  Year / Subject = ID type = cs;
   run;
   
*/

Re: Data Simulation for Repeated Measures Design

PGStats — Tue, 11 Feb 2020 04:53:53 GMT

You can make your 2020 dataset conditional on the 2019 data, this way:


data all;
call streaminit(4321);
array p[7,7]  _temporary_ /* conditional probabilities for second year response */
	(0.5  0.2  0.1  0.05 0.05 0.05 0.05
	 0.1  0.5  0.1  0.1  0.1  0.05 0.05
	 0.1  0.1  0.5  0.1  0.1  0.05 0.05
	 0.05 0.1  0.1  0.5  0.1  0.1  0.05
	 0.05 0.05 0.1  0.1  0.5  0.1  0.1
	 0.05 0.05 0.1  0.1  0.1  0.5  0.1
	 0.05 0.05 0.05 0.05 0.1  0.2  0.5
	 );
set timeFirst;
output;
Y = rand("Table", p[y,1], p[y,2], p[y,3], p[y,4], p[y,5], p[y,6], p[y,7]);
year = '2020';
output;
run;

Re: Data Simulation for Repeated Measures Design

Rick_SAS — Tue, 11 Feb 2020 19:05:56 GMT

I don't fully understand your design, but as PGStats says, you might want to generate both years at once. First, generate the 2019 value for a subject, then generate the 2020 value based on a random deviation (and treatment group effect?) from the 2019 value.

My advice is to write out the mixed model that you are trying to fit. You then will simulate from that model.

For an example of this kind of simulation study for power, see Psioda (2012) The paper is very good except for p. 9-10. You can also ignore the NumIterPer parameter in his study and just set it equal to the Iterations parameter. The added complexity isn't worth the time savings.

Re: Data Simulation for Repeated Measures Design

svh — Wed, 12 Feb 2020 18:01:15 GMT

I think I see what's happening here--the matrix is a set of conditional probabilities, which I would alter based on what I think the change could be at the next time point.

Re: Data Simulation for Repeated Measures Design

svh — Wed, 12 Feb 2020 18:04:18 GMT

In my design, I actually am testing for change in an outcome variable over time (the outcome is the perception of mistreatment in the educational setting). However, I have a random effect because participants are clustered in educational programs. I'm first trying to wrap my mind around how to conduct the simulation with one group--the reality is that mistreatment can vary across programs due to various reasons, so I will need to scale up to simulating the random effect of program.

Re: Data Simulation for Repeated Measures Design

Rick_SAS — Thu, 13 Feb 2020 13:45:54 GMT

For a random effect, sample the effect from N(0, sigma) once for each cluster. That value is used for all measurements within the cluster.