Of course, it would be best to find a consulting statistician who can take some time with you to fully understand the nature of the hospital process you want to model and to determine the best model and methods. Having said that, it sounds to me like you could fit a model to the LOS response observed on your patients. The input data set would have one observation (row) for each patient and contain variables (columns) which include the patients' LOS values and their values on all of the relevant predictors that you propose for LOS. There is also the question of how the LOS response is distributed in any single population. Clearly, it is positive in value and presumably has many small values, so it is likely not approximately normal. If LOS is discrete, such as a number of days, then you might consider using the Poisson or negative binomial distribution. Or, you could consider using a continuous distribution, like the gamma or inverse Gaussian, whether it is discrete or a continuous measure. There is a also the question of whether the LOS values might be correlated in clusters. For example, are the LOS values from patients with the same physician, or maybe in the same group, considered to be more alike? If so, the model should take this clustering into account to get appropriate tests.
With all this information, you could fit a model in a procedure like GLIMMIX, GEE, or GENMOD. For example, assuming that LOS values are considered to be correlated within groups but independent across groups, and that LOS can be considered to have a gamma distribution, then a model like the following might be a reasonable start - but that is something you will have to explore and decide on. Any categorical predictors should appear in both the CLASS and MODEL statements. I've made up some variable names based on your description. The LSMEANS statement will produce estimated LOS values for the groups and pairwise tests among them. Note that those estimates are adjusted for all of the predictors in the model. So, in particular, they are adjusted for the group sizes because of its inclusion in the model. But what variables are in the model and the form of the model (such as containing interactions), is something you would need to explore.
proc gee data=myLOSdata;
class group medicare insurance;
model LOS = group groupsize risklevel medicare insurance age severity / dist=gamma;
repeated subject=group / type=exch;
lsmeans group / diff ilink cl;
run;
... View more