BookmarkSubscribeRSS Feed
Walternate
Obsidian | Level 7

Hi,

 

I have a dataset with many categorical variables and one continuous numeric variable. I am trying to make a table summarizing descriptive statistics for the continuous variable across the levels of each of the categorical variables. I used the following proc tabulate code:

 

proc tabulate data=mydata;

var continuous_var;

class categ_var1 categ_var2 categ_var3 categ_var4 categ_var5;

table categ_var1 categ_var2 categ_var3 categ_var4 categ_var5, continouous_var*(median etc.);

run;

 

This creates the table I want:

 

                   Continuous_var

                   Median...etc.

Categ_var1

Level1          123

Level2          345

Categ_var2

Level1

etc.

 

The issue is that I would like to have a subtotal row for each of the categorical variables giving the values of the descriptive stastistics for everyone with a value for that caetgorical variable, like this:

                  Continuous_var

                   Median...etc.

Categ_var1

Level1          123

Level2          345

Subtotal       234

Categ_var2

etc.

 

I tried inserting "all" after each categ_var, and this added a row called all to the end of each categ_var section, but the numbers in each of the all segments are the same, so I think it's calculating some kind of total rather than a subtotal specific to each categorical var.

 

Any help is much appreciated.

2 REPLIES 2
Astounding
PROC Star

PROC TABULATE gives you identical statistics for your ALL categories because of the way that it handles CLASS variables.  By default, when any CLASS variable has a missing value, PROC TABULATE throws it out of the calculations.  So when CATEG_VAR1 is missing, that observation is thrown out, even if CATEG_VAR2 is nonmissing.

 

You can change that behavior by adding the MISSING option, but that would not solve your original problem.  That would just include all the missing values in the calculations, which is what you are trying to avoid.

 

For many statistics, there are relatively simple solutions.  For example, calculate the statistics and store them in a SAS data set.  Then modify the SAS data set to get your subtotals.  But for the MEDIAN that's not going to work.  You might need to process each CATEG variable separately, then put them all together for the final report.

ballardw
Super User

Just checking but do any of your categorical variables have missing values? You results may not be as needed if so as Proc Tabulate drops any record with with missing CLASS variables unless the option /missing is included for the class variable.

 

As your current table looks the "subtotal" will be the same for each categorical variable which gets to be a tad redundant. You could add an ALL to get the overall total.

This code demonstrates that you get duplicate values:

proc tabulate data=sashelp.class;
   class sex age;
   var weight;
   table sex all='All' age all='Overall',
         weight*mean;
   ;
run;

If you were NESTING values that might be different as Var1 *(Var2 All) would give a a summary overall for Var1 at the end of Var2.

 

Here's an example:


proc tabulate data=sashelp.class;
   class sex age;
   var weight;
   table sex * ( age all='All sex') All='All Data',
         weight*mean;
   ;
run;

SAS Innovate 2025: Register Now

Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!

What is Bayesian Analysis?

Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 2 replies
  • 3999 views
  • 0 likes
  • 3 in conversation