I am interested in conducting a data simulation that will help me understand the sample size for a study with a repeated measures design in which the subjects are measured at two time points. (I've been working with the book Simulating Data in SAS, but I'm not that familiar with PROC IML, so I have a learning curve ahead of me.) The subjects are administered a survey in which the ordinal outcomes are highly right skewed, and I am trying to understand whether I have enough subjects to detect an effect of an intervention. In my code below, I'm simulating two years of data from a discrete distribution based on the distribution from 2019. I am looking to measure change in 2020 in a group of matched individuals. In the following code, the simulation builds two independent data sets from which I randomly sample (with replacement) to get a sample size of 10 subjects per year (this is the size of the group I'm studying; my full study will have many groups, but I'm trying to simulate a study with one level of group for now). I don't think it's correct to just concatenate the data sets and assign the integers 1-10 in each year because it is probably more likely for an individual to move to an adjacent category and not three or four points away. How does a data simulation take this into account with a discrete distribution? Does anyone know of SAS white papers that get to this topic? data TimeFirst (keep=Y Group Year);
call streaminit(4321);
array p[7] (0.6 0.35 0.01 0.01 0.01 0.01 0.01); /* probabilities for current distribution */
do i = 1 to 100000;
Y = rand("Table", of p[*]); ;
Group = 'Rural';
Year = '2019';
output;
end;
run;
data TimeSecond (keep=Y Group Year);
call streaminit(4321);
array p[7] (0.5 0.45 0.01 0.01 0.01 0.01 0.01); /* probabilities based on assumption of effect of intervention */
do i = 1 to 100000;
Y = rand("Table", of p[*]);
Group = 'Rural';
Year = '2020';
output;
end;
run;
data All;
set Time:;
run;
proc sort data=All;
by Group Year;
run;
proc surveyselect data=All out=All_sample method=urs n=10;
strata Group Year; /*I am sampling from my simulations*/
run;
/*The problem here is I need subjects to be matched on a variable like ID, and this makes a sample data set of independent observations. My goal is to be able to run an anlaysis of the form:
proc mixed data=all_sample;
class year ID;
model Y = year;
repeated Year / Subject = ID type = cs;
run;
*/
... View more