Hi, I have a case data set with 70 hospital and 5 disease types. Some hospitals have less than 5 cases for the year, some hospitals have many cases. Some hospitals do not have all 5 disease types, some hospitals have all disease types. I am trying to write code to randomly sample 5 cases per hospital. However, there are parameters to how the cases can be chosen. If a hospital has all 5 disease types, I want to randomly select one disease type per hospital. If there are more than 5 cases and only 4 disease types, I want to capture the 4 different cases, and the fifth case must be chosen but disease type doesn't matter. If there are only 3 cases, I want to capture all 3 (randomly sampling doesn't really matter here anymore). And so on and so forth. Random sampling is only for concordance purposes so I don't truly need a randomized sample. However, there are hospitals with more than 100 cases, with more than 15 of each disease type, and I'd like those selections to be random (not the first case SAS reads). Ultimately, case number per hospital determines if I can even sample 5 (if less I'll take all), then I want to select at least one of each disease type per hospital, and then I want to sample 5 from each hospital. With 70 hospitals and 5 disease types, if there were at least 5 cases per hospital, I'd have an end sample size of 350. However, that is not always the situation because some hospitals may have less than 5 cases. Example below: hosp_id disease_type A 1 A 1 A 2 A 5 B 1 B 2 B 3 B 3 B 4 B 5 ... C 1 C 1 C 1 C 2 C 2 C 2 C 4 C 5 ... Any help greatly appreciated! 🙂
... View more