New Contributor
Posts: 2

How to simulate percentage data

Hello,

I want to simulate a dataset with two treatment groups for an endpoint percentage adherence to drug so that I can investigate the effect of missing data strategies on endpoint calculation.  Percentage adherence is calculated from a daily binary variable yes/no over a 28 day period. It is calculated as the percentage of days medication taken within the correct time interval (24 hour) during the 28 days.  The mean percentage adherence is assumed to be 40% for the comparator arm and an absolute difference of 20% is anticipated in the new treatment arm.  Percentage adherence will be analysed as a continuous normally distributed variable and standard deviation is assumed to be 30%.  My question is how can I simulate the repeated binary data at the individual level whilst still making sure that the percentage adherence within  each arm follows a normal distribution with mean mu1 & mu2 and standard deviation equal to 30%.

I'm using SAS 9.3.

Super User
Posts: 10,213

Re: How to simulate percentage data

```
Better post it IML forum ,it is about data simulation.
And better give an example to describe your question.
```
SAS Super FREQ
Posts: 3,839

Re: How to simulate percentage data

It sounds like you want to simulate a binary response with a two-factor explanatory variable. Look at the article "Simulating data for a logistic regression model" to get started. The main idea is that you use RAND("Bernoulli", p)  where p is the probability of taking the medicine. All you need to do is generate the data for n1 patients with p1=0.4 and n2 patients with p2=0.2.

A statistical clarification: It sounds like you are looking for a PROPORTION, not a mean. You want the PROPORTION to be 0.4 for the control group and 0.2 for the new treatment group. The adherence of individual subjects will BINOMIALLY distributed, not normally distributed (although you can use the normal approximation for large samples.)  If that is correct, then you don't get to choose the standard deviation: in a binomial experiment, the standard deviation is sqrt(n*p*(1-p)). It is determined by the proportion and the sample size.

Try something like this:

``````%let p1 = 0.4;
%let n1 = 50;
%let p2 = 0.2;
%let n2 = 50;
data Study;
format patientID Z3.;
call streaminit(1234);
Treatment = "Control     ";
do i = 1 to &n1;
PatientID + 1;
do day = 1 to 28 ;
TookMed = rand("Bernoulli", &p1);
output;
end;
end;
Treatment = "Experimental";
do i = 1 to &n2;
PatientID + 1;
do day = 1 to 28 ;
TookMed = rand("Bernoulli", &p2);
output;
end;
end;
run;

proc freq data=Study;
tables TookMed*Treatment / nocum norow;
run;

proc means data=Study;
class Treatment;
var TookMed;
run;

proc sgpanel data=Study;
where PatientID in (10 20 30 40 60 70 80 90);
panelby PatientID Treatment/ columns=4 onepanel;
scatter x=day y=TookMed;
run;

``````
New Contributor
Posts: 2

Re: How to simulate percentage data

Hi Rick,

Thanks for your helpful response. I came back to this problem again
yesterday and the simulated data very much helped me visualise the problem.

To give clarity with respect to using the mean adherence, we have a scenario
where in one treatment group patients are asked to take multiple drugs in a
24 hour period and so for each day daily adherence would yield a percentage
and not a 0 or 1. So for example, if a patient is required to take 3 drugs
in a 24 hour period then, if they take one this yields a daily percentage of
33.34%, 2 then 66.67% and so on. The interest lies in calculating the mean
daily adherence for each patient. This is why we originally approached the
problem as a continuous variable.

Regards,

Lynn