BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
liliu
Fluorite | Level 6

How do I simulate case control data in analyzing power for conditional logistic regression model? I have experience in simulating logistic regression models, but not sure how to generate data that's case control (1:1 ratio) design. Thank you!

1 ACCEPTED SOLUTION

Accepted Solutions
Rick_SAS
SAS Super FREQ

I'm glad you found these thoughts helpful. I wasn't actually "suggesting" an approach, I was sort of wondering out loud how you intended to get the 1:1 ratio. My understanding is that it is okay to recruit more "cases" than "controls," and that can improve the power, especially for rare diseases/events. I found this abstract of a Lancet article, although I have not read the paper.

 

Ultimately, a simulation should reflect the design of the experiment that you are intending to carry out. In a real study, how would you recruit the cases and controls? If you use random selection, then in your simulation you can output the first N cases and the first N controls to generate the case-control sample. Of course, you might need to generate many more than 2N observations, so you should probably use DO UNTIL logic. Although your intentions are not clear, perhaps the following simulation might be modified to your needs:

 

%let N = 15;             /* want N in each group */ 
data CaseControl;
length Group $7;
call streaminit(1234);                     /* set the random number seed */

NCase = 0; NControl = 0;
do until (NCase >= &N and NControl >= &N);  
   x1 = rand("Uniform");
   c1 = rand("Bernoulli", 0.3);
   eta = 1 + 0.8*x1 - 1.2*c1 + 0.4*c1*x1;  /* interaction model */
   mu = logistic(eta);
   Y = rand("Bernoulli", mu);
   Group = ifc(Y = 1, "Case", "Control");
   if Group = "Case" & NCase < &N then do;
      NCase + 1; output;
   end;
   if Group = "Control" & NControl < &N then do;
      NControl + 1; output;
   end;
end;

keep Y x1 c1 Group;
run;

proc logistic data=CaseControl plots(only)=effectplot;
class c1 ;
model Y(event='1') = x1 | c1;
run;

View solution in original post

9 REPLIES 9
Rick_SAS
SAS Super FREQ

I think we need a few more details about your plans for the study, such as how you plan to achieve the 1:1 ratio. Typically case-control studies are observational, and the "case" group has the disease whereas the "control" group does not. Are you assuming that the half the population has the disease and half don't? Or do you plan to over/undersample the population to achieve the 1:1 ratio? Could you provide more details?

 

In general, all simulation studies, start with a model of the population from which you will sample. Read the articles "Simulating data for a logistic regression model" (DATA step) and "Simulate many samples from a logistic regression model" (SAS/IML) for the general framework. (You can also read an example about using simulation to estimate power.) During the simulation you can assign GROUP="Control" for subjects that have Y=0 and GROUP="Case" for subjects that have Y=1.  But in general these groups won't have a 1:1 ratio unless additional steps are taken.

liliu
Fluorite | Level 6

Thank you for your reply!!!

 

This is actually very helpful. Yes I'm doing power for a case-control data (1:1 ratio) that aimed to test an interaction between a continuous variable and race. Generating logistic data using an interaction term is no problem for me, but I got stuck at making half the sample control, half case. Seems you are suggesting generating larger data and sample from it to get the 1:1 case-control data that I want? That is a great idea! Question though: Will sampling from the larger data set change the effect size I used to generate the larger data set?

 

Thank you! I'm so glad I posted the question after scratching my head in front of my computer for a whole day!!

Rick_SAS
SAS Super FREQ

I'm glad you found these thoughts helpful. I wasn't actually "suggesting" an approach, I was sort of wondering out loud how you intended to get the 1:1 ratio. My understanding is that it is okay to recruit more "cases" than "controls," and that can improve the power, especially for rare diseases/events. I found this abstract of a Lancet article, although I have not read the paper.

 

Ultimately, a simulation should reflect the design of the experiment that you are intending to carry out. In a real study, how would you recruit the cases and controls? If you use random selection, then in your simulation you can output the first N cases and the first N controls to generate the case-control sample. Of course, you might need to generate many more than 2N observations, so you should probably use DO UNTIL logic. Although your intentions are not clear, perhaps the following simulation might be modified to your needs:

 

%let N = 15;             /* want N in each group */ 
data CaseControl;
length Group $7;
call streaminit(1234);                     /* set the random number seed */

NCase = 0; NControl = 0;
do until (NCase >= &N and NControl >= &N);  
   x1 = rand("Uniform");
   c1 = rand("Bernoulli", 0.3);
   eta = 1 + 0.8*x1 - 1.2*c1 + 0.4*c1*x1;  /* interaction model */
   mu = logistic(eta);
   Y = rand("Bernoulli", mu);
   Group = ifc(Y = 1, "Case", "Control");
   if Group = "Case" & NCase < &N then do;
      NCase + 1; output;
   end;
   if Group = "Control" & NControl < &N then do;
      NControl + 1; output;
   end;
end;

keep Y x1 c1 Group;
run;

proc logistic data=CaseControl plots(only)=effectplot;
class c1 ;
model Y(event='1') = x1 | c1;
run;
liliu
Fluorite | Level 6

Thank you, Rich for your help! I went with our original thought, generated larger data from the model I'm testing, used Proc Surveyselect to select a 1:1 ratio case/control, and achieved what I wanted. The solution your sent is definitely more elegant. I'm saving it for the future. Thank you for your prompt replies, suggestions, and help!

Rick_SAS
SAS Super FREQ
Great! The advantage of the PROC SURVEYSELECT approach is that is equally valid for real or simulated data,
liliu
Fluorite | Level 6
Thank you, Rick, for all your help!

Just so you know, I ran the simulations using your program just to make
sure mine is correct. Results are very close with yours is consistently a
little bit more conservative than mine (1000 simulations, if mine has 81.9%
power, yours would be 80.5). I ended up using results from your program to
be safe.

I really appreciate your expertise and help!!!
tritringuyen
Quartz | Level 8

Hi @Rick_SAS, you are definitely an expert in simulation. Could you please help me with my issue:

 

https://communities.sas.com/t5/SAS-Procedures/Simulation-Sampling-and-OLS-regression-in-SAS/m-p/4867...

 

Thank you very much! Best regards, Thierry

JacobSimonsen
Barite | Level 11

The simulation method @Rick_SAS's suggestion is ok. But the analysis should be conditional logistic regression due to the design. Therefore, there has to be a strata statement in proc logistic.

 

 

data simulation;

array p{0:1}; *vector of probabilities of exposure in case and control group;

array odds{0:1};

or=1.5; *the odds ratio, that are common for all strata;

do strata=1 to 100000;

p[0]=rand('uniform'); *the probability of exposure in the control group;

*now some  simple calculations to calculate probability in the case group;

odds[0]=p[0]/(1-p[0]);

odds[1]=or*odds[0];

p[1]=odds[1]/(1+odds[1]);

do case=0 to 1;

exposure=rand('bernoulli',p[case]);

output;

end;

end;

keep strata case exposure;

run;

 

*The strata statement in proc logistic condition that due to the design there is one case per stratum;

proc logistic data=simulation;

class exposure(ref="0");

model case(event="1")=exposure;

strata strata/nosummary;

run;

liliu
Fluorite | Level 6

Thank you for your reply - yes, conditional model should be used for matched data. I liked your simulation part, too!

SAS Innovate 2025: Register Now

Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!

What is ANOVA?

ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 9 replies
  • 3573 views
  • 7 likes
  • 4 in conversation