Hi @Anav0416,
Thanks for the clarification (esp. about what you meant by "sufficiently large")
@Anav0416 wrote:
Question: I agreed with your implementation but is wondering whether the new procedure actually produces less number of records than intended?
I don't think that the new procedure produces systematically smaller sample sizes than your previous approach. It depends on the data. Assuming that your dataset case_strata contained the rates 64.0, 23.6, 12.4 (percent) as shown in your initial post, your approach applied to my example data (i.e. with rates 60% for stratum 1 and 40% for stratum 2) would always result in 5 observations: 0.6*4=2.4, rounded up to 3, plus 0.4*4=1.6, rounded up to 2 (see documentation). With the new procedure the sample size is a random variable. Its expected value in this example is approx. 6.35 (according to my calculation involving the negative binomial distribution), hence greater than 5.
@Anav0416 wrote:
Will applying a cutoff point help in this case?
Yes, it seems plausible to me that you'll increase the total sample size (on average) by using such a cutoff. However, the proportions of the strata in the controls will then be less similar to those in the cases. Without a cutoff the algorithm tends to produce similar proportions (of course varying because it's not deterministic). So, if moderate deviations are acceptable, you could give it a try.
... View more