09-14-2016 09:23 AM
I have a dataset with many categorical variables and one continuous numeric variable. I am trying to make a table summarizing descriptive statistics for the continuous variable across the levels of each of the categorical variables. I used the following proc tabulate code:
proc tabulate data=mydata;
class categ_var1 categ_var2 categ_var3 categ_var4 categ_var5;
table categ_var1 categ_var2 categ_var3 categ_var4 categ_var5, continouous_var*(median etc.);
This creates the table I want:
The issue is that I would like to have a subtotal row for each of the categorical variables giving the values of the descriptive stastistics for everyone with a value for that caetgorical variable, like this:
I tried inserting "all" after each categ_var, and this added a row called all to the end of each categ_var section, but the numbers in each of the all segments are the same, so I think it's calculating some kind of total rather than a subtotal specific to each categorical var.
Any help is much appreciated.
09-14-2016 09:36 AM
PROC TABULATE gives you identical statistics for your ALL categories because of the way that it handles CLASS variables. By default, when any CLASS variable has a missing value, PROC TABULATE throws it out of the calculations. So when CATEG_VAR1 is missing, that observation is thrown out, even if CATEG_VAR2 is nonmissing.
You can change that behavior by adding the MISSING option, but that would not solve your original problem. That would just include all the missing values in the calculations, which is what you are trying to avoid.
For many statistics, there are relatively simple solutions. For example, calculate the statistics and store them in a SAS data set. Then modify the SAS data set to get your subtotals. But for the MEDIAN that's not going to work. You might need to process each CATEG variable separately, then put them all together for the final report.
09-14-2016 10:51 AM
Just checking but do any of your categorical variables have missing values? You results may not be as needed if so as Proc Tabulate drops any record with with missing CLASS variables unless the option /missing is included for the class variable.
As your current table looks the "subtotal" will be the same for each categorical variable which gets to be a tad redundant. You could add an ALL to get the overall total.
This code demonstrates that you get duplicate values:
proc tabulate data=sashelp.class; class sex age; var weight; table sex all='All' age all='Overall', weight*mean; ; run;
If you were NESTING values that might be different as Var1 *(Var2 All) would give a a summary overall for Var1 at the end of Var2.
Here's an example:
proc tabulate data=sashelp.class; class sex age; var weight; table sex * ( age all='All sex') All='All Data', weight*mean; ; run;