Programming the statistical procedures from SAS

Simulating Case Control data

Accepted Solution Solved
Reply
New Contributor
Posts: 4
Accepted Solution

Simulating Case Control data

How do I simulate case control data in analyzing power for conditional logistic regression model? I have experience in simulating logistic regression models, but not sure how to generate data that's case control (1:1 ratio) design. Thank you!


Accepted Solutions
Solution
‎06-10-2018 05:14 PM
SAS Super FREQ
Posts: 4,237

Re: Simulating Case Control data

I'm glad you found these thoughts helpful. I wasn't actually "suggesting" an approach, I was sort of wondering out loud how you intended to get the 1:1 ratio. My understanding is that it is okay to recruit more "cases" than "controls," and that can improve the power, especially for rare diseases/events. I found this abstract of a Lancet article, although I have not read the paper.

 

Ultimately, a simulation should reflect the design of the experiment that you are intending to carry out. In a real study, how would you recruit the cases and controls? If you use random selection, then in your simulation you can output the first N cases and the first N controls to generate the case-control sample. Of course, you might need to generate many more than 2N observations, so you should probably use DO UNTIL logic. Although your intentions are not clear, perhaps the following simulation might be modified to your needs:

 

%let N = 15;             /* want N in each group */ 
data CaseControl;
length Group $7;
call streaminit(1234);                     /* set the random number seed */

NCase = 0; NControl = 0;
do until (NCase >= &N and NControl >= &N);  
   x1 = rand("Uniform");
   c1 = rand("Bernoulli", 0.3);
   eta = 1 + 0.8*x1 - 1.2*c1 + 0.4*c1*x1;  /* interaction model */
   mu = logistic(eta);
   Y = rand("Bernoulli", mu);
   Group = ifc(Y = 1, "Case", "Control");
   if Group = "Case" & NCase < &N then do;
      NCase + 1; output;
   end;
   if Group = "Control" & NControl < &N then do;
      NControl + 1; output;
   end;
end;

keep Y x1 c1 Group;
run;

proc logistic data=CaseControl plots(only)=effectplot;
class c1 ;
model Y(event='1') = x1 | c1;
run;

View solution in original post


All Replies
SAS Super FREQ
Posts: 4,237

Re: Simulating Case Control data

I think we need a few more details about your plans for the study, such as how you plan to achieve the 1:1 ratio. Typically case-control studies are observational, and the "case" group has the disease whereas the "control" group does not. Are you assuming that the half the population has the disease and half don't? Or do you plan to over/undersample the population to achieve the 1:1 ratio? Could you provide more details?

 

In general, all simulation studies, start with a model of the population from which you will sample. Read the articles "Simulating data for a logistic regression model" (DATA step) and "Simulate many samples from a logistic regression model" (SAS/IML) for the general framework. (You can also read an example about using simulation to estimate power.) During the simulation you can assign GROUP="Control" for subjects that have Y=0 and GROUP="Case" for subjects that have Y=1.  But in general these groups won't have a 1:1 ratio unless additional steps are taken.

New Contributor
Posts: 4

Re: Simulating Case Control data

Thank you for your reply!!!

 

This is actually very helpful. Yes I'm doing power for a case-control data (1:1 ratio) that aimed to test an interaction between a continuous variable and race. Generating logistic data using an interaction term is no problem for me, but I got stuck at making half the sample control, half case. Seems you are suggesting generating larger data and sample from it to get the 1:1 case-control data that I want? That is a great idea! Question though: Will sampling from the larger data set change the effect size I used to generate the larger data set?

 

Thank you! I'm so glad I posted the question after scratching my head in front of my computer for a whole day!!

Solution
‎06-10-2018 05:14 PM
SAS Super FREQ
Posts: 4,237

Re: Simulating Case Control data

I'm glad you found these thoughts helpful. I wasn't actually "suggesting" an approach, I was sort of wondering out loud how you intended to get the 1:1 ratio. My understanding is that it is okay to recruit more "cases" than "controls," and that can improve the power, especially for rare diseases/events. I found this abstract of a Lancet article, although I have not read the paper.

 

Ultimately, a simulation should reflect the design of the experiment that you are intending to carry out. In a real study, how would you recruit the cases and controls? If you use random selection, then in your simulation you can output the first N cases and the first N controls to generate the case-control sample. Of course, you might need to generate many more than 2N observations, so you should probably use DO UNTIL logic. Although your intentions are not clear, perhaps the following simulation might be modified to your needs:

 

%let N = 15;             /* want N in each group */ 
data CaseControl;
length Group $7;
call streaminit(1234);                     /* set the random number seed */

NCase = 0; NControl = 0;
do until (NCase >= &N and NControl >= &N);  
   x1 = rand("Uniform");
   c1 = rand("Bernoulli", 0.3);
   eta = 1 + 0.8*x1 - 1.2*c1 + 0.4*c1*x1;  /* interaction model */
   mu = logistic(eta);
   Y = rand("Bernoulli", mu);
   Group = ifc(Y = 1, "Case", "Control");
   if Group = "Case" & NCase < &N then do;
      NCase + 1; output;
   end;
   if Group = "Control" & NControl < &N then do;
      NControl + 1; output;
   end;
end;

keep Y x1 c1 Group;
run;

proc logistic data=CaseControl plots(only)=effectplot;
class c1 ;
model Y(event='1') = x1 | c1;
run;
New Contributor
Posts: 4

Re: Simulating Case Control data

Thank you, Rich for your help! I went with our original thought, generated larger data from the model I'm testing, used Proc Surveyselect to select a 1:1 ratio case/control, and achieved what I wanted. The solution your sent is definitely more elegant. I'm saving it for the future. Thank you for your prompt replies, suggestions, and help!

SAS Super FREQ
Posts: 4,237

Re: Simulating Case Control data

Great! The advantage of the PROC SURVEYSELECT approach is that is equally valid for real or simulated data,
New Contributor
Posts: 4

Re: Simulating Case Control data

Thank you, Rick, for all your help!

Just so you know, I ran the simulations using your program just to make
sure mine is correct. Results are very close with yours is consistently a
little bit more conservative than mine (1000 simulations, if mine has 81.9%
power, yours would be 80.5). I ended up using results from your program to
be safe.

I really appreciate your expertise and help!!!
☑ This topic is solved.

Need further help from the community? Please ask a new question.

Discussion stats
  • 6 replies
  • 371 views
  • 6 likes
  • 2 in conversation