Hello,
I have a complicated sampling strategy that I am trying to program in SAS 9.4 and am hoping someone can help me.
My dataset of ~150,000 includes patients at 1230 facilities that received two types of treatment. ~4000 received treatment A while ~146,000 received treatment B. I need to generate a dataset with no more than 15 patients selected from each facility, but keep as many of the 4000 A patients as I can. So for example if a facility had 5 A patients I would keep all 5 A patients then draw a random sample of 10 B patients from that facility to get n=15. If there are no A patients at the facility then I would take a random sample of 15 B patients, likewise if there are 15 A patients at the facility I would keep all 15 of them.
There are a few facilities where there are more than 15 A patients, so I would need to draw a random sample of 15 A patients for these facilities. I would then lose a few A patients from my overall sample.
Because the number of patients A+B from each facility ranges from 0 to >1000, I don't know if I can use PPS or weighting in this algorithm or if I will need to adjust for sampling weights in my analysis later.
I have attached a small sample dataset.
Any help would be greatly appreciated!
Thanks,
Laura