11-27-2013 08:30 AM
I have a doubt that I believe has an easy solution but need your help to find it!
I have a table with 8 columns. the first is my patientID, the next 6 are profile patient information with categorical values (eg: location, gender, age, disease, etc) and the last column is a continuous vAriable that tells me how much they paid in the hospital. I need to perform two pieces of analysis:
1) identify those groups (based on all the possible combinations) that maximize, by patient, the total spent
2) same as 1) but creating a rule stating that i only want groups with more than x patients.
how can i do this in sas eg/em/base?
11-27-2013 10:10 AM
First, let's define what the problem asks for. I would interpret this as finding the groups with the largest average spent. If that's not right, you'll have to explain what you mean by "maximize, by patient, the total spent".
Second, this is an easy task for SAS in theory. You can get the average spent for every possible group easily:
proc summary data=have missing;
class location gender age disease /* plus 2 more variables not named in the problem */;
output out=stats mean=avg_paid;
The output data set STATS will even contain _FREQ_, holding the number of patients in the group. So applying rules about minimum group size is easy.
The trick will be whether your machine has enough memory to compute statistics for all groups at the same time. If you don't run out of memory, the continuation is easy:
proc sort data=stats;
by descending avg_paid;
proc print data=stats (obs=50);
You will need to learn a few things about the CLASS statement: how to translate from _TYPE_ to the group definition, and how the CLASS statement handles missing values (assuming that your data actually contains some missing values).
If you do run out of memory, a more complex strategy would be necessary. But this is a good place to start.