03-13-2017 04:59 PM
I want to simulate a dataset with two treatment groups for an endpoint percentage adherence to drug so that I can investigate the effect of missing data strategies on endpoint calculation. Percentage adherence is calculated from a daily binary variable yes/no over a 28 day period. It is calculated as the percentage of days medication taken within the correct time interval (24 hour) during the 28 days. The mean percentage adherence is assumed to be 40% for the comparator arm and an absolute difference of 20% is anticipated in the new treatment arm. Percentage adherence will be analysed as a continuous normally distributed variable and standard deviation is assumed to be 30%. My question is how can I simulate the repeated binary data at the individual level whilst still making sure that the percentage adherence within each arm follows a normal distribution with mean mu1 & mu2 and standard deviation equal to 30%.
Many thanks in advance for any help you can offer on this.
I'm using SAS 9.3.
03-14-2017 06:10 AM
It sounds like you want to simulate a binary response with a two-factor explanatory variable. Look at the article "Simulating data for a logistic regression model" to get started. The main idea is that you use RAND("Bernoulli", p) where p is the probability of taking the medicine. All you need to do is generate the data for n1 patients with p1=0.4 and n2 patients with p2=0.2.
A statistical clarification: It sounds like you are looking for a PROPORTION, not a mean. You want the PROPORTION to be 0.4 for the control group and 0.2 for the new treatment group. The adherence of individual subjects will BINOMIALLY distributed, not normally distributed (although you can use the normal approximation for large samples.) If that is correct, then you don't get to choose the standard deviation: in a binomial experiment, the standard deviation is sqrt(n*p*(1-p)). It is determined by the proportion and the sample size.
Try something like this:
%let p1 = 0.4; %let n1 = 50; %let p2 = 0.2; %let n2 = 50; data Study; format patientID Z3.; call streaminit(1234); Treatment = "Control "; do i = 1 to &n1; PatientID + 1; do day = 1 to 28 ; TookMed = rand("Bernoulli", &p1); output; end; end; Treatment = "Experimental"; do i = 1 to &n2; PatientID + 1; do day = 1 to 28 ; TookMed = rand("Bernoulli", &p2); output; end; end; run; proc freq data=Study; tables TookMed*Treatment / nocum norow; run; proc means data=Study; class Treatment; var TookMed; run; proc sgpanel data=Study; where PatientID in (10 20 30 40 60 70 80 90); panelby PatientID Treatment/ columns=4 onepanel; scatter x=day y=TookMed; run;
03-28-2017 08:46 AM
03-28-2017 09:57 AM
Thank you for the clarification. From your explanation I withdraw my satement about the response being binomially distributed. Even though 0/3, 1/3, 2/3, and 3/3 are the possible outcomes, this is not a sequence of random independent trials with constant probability of success.