turn on suggestions

Auto-suggest helps you quickly narrow down your search results by suggesting possible matches as you type.

Showing results for

Find a Community

- Home
- /
- Analytics
- /
- Stat Procs
- /
- How to simulate percentage data

Topic Options

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

03-13-2017 04:59 PM

Hello,

I want to simulate a dataset with two treatment groups for an endpoint percentage adherence to drug so that I can investigate the effect of missing data strategies on endpoint calculation. Percentage adherence is calculated from a daily binary variable yes/no over a 28 day period. It is calculated as the percentage of days medication taken within the correct time interval (24 hour) during the 28 days. The mean percentage adherence is assumed to be 40% for the comparator arm and an absolute difference of 20% is anticipated in the new treatment arm. Percentage adherence will be analysed as a continuous normally distributed variable and standard deviation is assumed to be 30%. My question is how can I simulate the repeated binary data at the individual level whilst still making sure that the percentage adherence within each arm follows a normal distribution with mean mu1 & mu2 and standard deviation equal to 30%.

Many thanks in advance for any help you can offer on this.

I'm using SAS 9.3.

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

03-14-2017 01:14 AM

Better post it IML forum ,it is about data simulation. And better give an example to describe your question.

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

03-14-2017 06:10 AM

It sounds like you want to simulate a binary response with a two-factor explanatory variable. Look at the article "Simulating data for a logistic regression model" to get started. The main idea is that you use RAND("Bernoulli", p) where p is the probability of taking the medicine. All you need to do is generate the data for n1 patients with p1=0.4 and n2 patients with p2=0.2.

A statistical clarification: It sounds like you are looking for a PROPORTION, not a mean. You want the PROPORTION to be 0.4 for the control group and 0.2 for the new treatment group. The adherence of individual subjects will BINOMIALLY distributed, not normally distributed (although you can use the normal approximation for large samples.) If that is correct, then you don't get to choose the standard deviation: in a binomial experiment, the standard deviation is sqrt(n*p*(1-p)). It is determined by the proportion and the sample size.

Try something like this:

```
%let p1 = 0.4;
%let n1 = 50;
%let p2 = 0.2;
%let n2 = 50;
data Study;
format patientID Z3.;
call streaminit(1234);
Treatment = "Control ";
do i = 1 to &n1;
PatientID + 1;
do day = 1 to 28 ;
TookMed = rand("Bernoulli", &p1);
output;
end;
end;
Treatment = "Experimental";
do i = 1 to &n2;
PatientID + 1;
do day = 1 to 28 ;
TookMed = rand("Bernoulli", &p2);
output;
end;
end;
run;
proc freq data=Study;
tables TookMed*Treatment / nocum norow;
run;
proc means data=Study;
class Treatment;
var TookMed;
run;
proc sgpanel data=Study;
where PatientID in (10 20 30 40 60 70 80 90);
panelby PatientID Treatment/ columns=4 onepanel;
scatter x=day y=TookMed;
run;
```

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

03-28-2017 08:46 AM

Hi Rick,

Thanks for your helpful response. I came back to this problem again

yesterday and the simulated data very much helped me visualise the problem.

To give clarity with respect to using the mean adherence, we have a scenario

where in one treatment group patients are asked to take multiple drugs in a

24 hour period and so for each day daily adherence would yield a percentage

and not a 0 or 1. So for example, if a patient is required to take 3 drugs

in a 24 hour period then, if they take one this yields a daily percentage of

33.34%, 2 then 66.67% and so on. The interest lies in calculating the mean

daily adherence for each patient. This is why we originally approached the

problem as a continuous variable.

Regards,

Lynn

##- Please type your reply above this line. Simple formatting, no

attachments. -##

Thanks for your helpful response. I came back to this problem again

yesterday and the simulated data very much helped me visualise the problem.

To give clarity with respect to using the mean adherence, we have a scenario

where in one treatment group patients are asked to take multiple drugs in a

24 hour period and so for each day daily adherence would yield a percentage

and not a 0 or 1. So for example, if a patient is required to take 3 drugs

in a 24 hour period then, if they take one this yields a daily percentage of

33.34%, 2 then 66.67% and so on. The interest lies in calculating the mean

daily adherence for each patient. This is why we originally approached the

problem as a continuous variable.

Regards,

Lynn

##- Please type your reply above this line. Simple formatting, no

attachments. -##

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

03-28-2017 09:57 AM

Thank you for the clarification. From your explanation I withdraw my satement about the response being binomially distributed. Even though 0/3, 1/3, 2/3, and 3/3 are the possible outcomes, this is not a sequence of random independent trials with constant probability of success.