07-21-2017 09:43 AM
I have a complicated sampling strategy that I am trying to program in SAS 9.4 and am hoping someone can help me.
My dataset of ~150,000 includes patients at 1230 facilities that received two types of treatment. ~4000 received treatment A while ~146,000 received treatment B. I need to generate a dataset with no more than 15 patients selected from each facility, but keep as many of the 4000 A patients as I can. So for example if a facility had 5 A patients I would keep all 5 A patients then draw a random sample of 10 B patients from that facility to get n=15. If there are no A patients at the facility then I would take a random sample of 15 B patients, likewise if there are 15 A patients at the facility I would keep all 15 of them.
There are a few facilities where there are more than 15 A patients, so I would need to draw a random sample of 15 A patients for these facilities. I would then lose a few A patients from my overall sample.
Because the number of patients A+B from each facility ranges from 0 to >1000, I don't know if I can use PPS or weighting in this algorithm or if I will need to adjust for sampling weights in my analysis later.
I have attached a small sample dataset.
Any help would be greatly appreciated!
07-21-2017 09:48 AM
Have you already attempted to use SURVEYSELECT for this sampling, or do you not know about that procedure?
07-21-2017 10:08 AM
I have used surveyselect, but I need a way to draw all of the As for each facility before sampling the Bs. If I remove the A first from the dataset I need to be able to select only 15-A patients from each facility. I think i need some sort of macro that will cycle through the observations at each faciity and apply the conditions, but I don't know how to do this. I'm actually not as worried about the PPS as getting the correct number of A and B per facility.
07-22-2017 10:59 AM
Post some data here. Not excel. No one would like to download it from websit.
And don't forget post your output either.