Hello,
I have a complicated sampling strategy that I am trying to program in SAS 9.4 and am hoping someone can help me.
My dataset of ~150,000 includes patients at 1230 facilities that received two types of treatment. ~4000 received treatment A while ~146,000 received treatment B. I need to generate a dataset with no more than 15 patients selected from each facility, but keep as many of the 4000 A patients as I can. So for example if a facility had 5 A patients I would keep all 5 A patients then draw a random sample of 10 B patients from that facility to get n=15. If there are no A patients at the facility then I would take a random sample of 15 B patients, likewise if there are 15 A patients at the facility I would keep all 15 of them.
There are a few facilities where there are more than 15 A patients, so I would need to draw a random sample of 15 A patients for these facilities. I would then lose a few A patients from my overall sample.
Because the number of patients A+B from each facility ranges from 0 to >1000, I don't know if I can use PPS or weighting in this algorithm or if I will need to adjust for sampling weights in my analysis later.
I have attached a small sample dataset.
Any help would be greatly appreciated!
Thanks,
Laura
Have you already attempted to use SURVEYSELECT for this sampling, or do you not know about that procedure?
The doc includes an example of PPS stratified sampling.
I have used surveyselect, but I need a way to draw all of the As for each facility before sampling the Bs. If I remove the A first from the dataset I need to be able to select only 15-A patients from each facility. I think i need some sort of macro that will cycle through the observations at each faciity and apply the conditions, but I don't know how to do this. I'm actually not as worried about the PPS as getting the correct number of A and B per facility.
Post some data here. Not excel. No one would like to download it from websit.
And don't forget post your output either.
Don't miss out on SAS Innovate - Register now for the FREE Livestream!
Can't make it to Vegas? No problem! Watch our general sessions LIVE or on-demand starting April 17th. Hear from SAS execs, best-selling author Adam Grant, Hot Ones host Sean Evans, top tech journalist Kara Swisher, AI expert Cassie Kozyrkov, and the mind-blowing dance crew iLuminate! Plus, get access to over 20 breakout sessions.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.