PROC TABULATE: totals across multiple class variables

Curt · Posted 04-10-2024 11:52 AM

Hi everyone,

I would like to make a table which displays a total of all non-missing values across multiple class row variables, for each level of the column variable.

My idea was this code:

PROC TABULATE data=have missing;
class columnvar rowvar1-rowvar10;
table rowvar1-rowvar10 all, columnvar;
run;

but unfortunately it doesn't work. Instead of displaying the total of all non-missing values across all the row variables in the ALL variable, it just provides a total of the last rowvariable 10.

Many thanks in advance,

Curt

PaigeMiller · Posted 04-10-2024 11:58 AM

Show us a portion of the data, as working SAS data step code (examples and instructions).

Show us a screen capture of the output, using the "Insert Photos" icon to include the screen capture; do not attach files.

--
Paige Miller

ballardw · Posted 04-10-2024 01:47 PM

@Curt wrote:

Hi everyone,

I would like to make a table which displays a total of all non-missing values across multiple class row variables, for each level of the column variable.

My idea was this code:

PROC TABULATE data=have missing;
class columnvar rowvar1-rowvar10;
table rowvar1-rowvar10 all, columnvar;
run;

but unfortunately it doesn't work. Instead of displaying the total of all non-missing values across all the row variables in the ALL variable, it just provides a total of the last rowvariable 10.

Many thanks in advance,

Curt

This statement you made is technically incorrect: it just provides a total of the last rowvariable 10.

All, used the way you have, summarized the entire table. You can check that by running the following. The ALL row will show the same result.

PROC TABULATE data=have missing;
class columnvar rowvar1-rowvar10;
table  all, columnvar;
run;

Proc tabulate in general does not combine multiple class variables for any purpose as you are contemplating.

Without example data and expected output for that given data I won't suggest a specific solution but it may mean restructuring your data and/or adding variable(s) to use as VAR variables.

Curt · Posted 04-13-2024 08:45 AM

Hi ballardw,

thank you for your answer.

Unfortunately, I can't show the data because it is confidential.

However, what I can say is that all the row variables describe diagnoses (ICD-Codes) and the column variable describes the exact location from where those diagnoses are from.

So if proc tabulate doesn't work in this case, is there another possibility to display the data in the way I described?

Thank you,

Curt

ballardw · Posted 04-13-2024 11:08 AM

Data does not have to be exactly your data, just something that behaves in the same way.

And if the data you share only included ICD codes and the column variable replace the "exact location" with something like A, B, and C .

It wouldn't even have to have the same number of variables, 3 of your "row" variables would likely be sufficient.

However it would have to have enough values so that you can indicate what the desired "total" looks like.

It also is not at all clear what "a total of all non-missing values" actually means. Is this a count non-missing, or do you want to display each actual value (ICD code) and its count?

Here is one way to get one sort of count but the reshape step to work with all rows is likely going to be common:

data example;
   input col $ row1 $ row2 $ row3 $;
datalines;
A   A1  .  C5
A   B4  C5 D6
A   .   A1 B4
B   .   A2 B4
B   .   C5 A1
;
data reshape;
   set example;
   array r(*) row: ;
   do i=1 to dim(r);
      rowname = vname(r[i]);
      rowVal  = r[i];
      output;
   end;
   keep col rowname rowval;
run;


proc tabulate data=reshape;
   class rowval col;
   table rowval,
         col
   ;
run;

Note that if there are other variables that need to be used in this report then you would need to keep them in the reshape step but exactly how may have some details as nothing about them has been discussed.

Curt · Posted 04-21-2024 08:54 AM

Hello ballardw,

thank you for your help. I just tried out your code and it displayed exactly what I wanted: a total count of all the non-missing values per column and a count of every specific icd-code within a column. I just needed to add “all“ behind “table rowval“ to get a total.

However, I found out that the reshape step has an influence on numeric variables written in the keep statement by increasing their N:

data example;
   input col $ row1 $ row2 $ row3 $ numvar;
datalines;
A   A1  .  C5 1
A   B4  C5 D6 2
A   .   A1 B4 3
B   .   A2 B4 4
B   .   C5 A1 5
;

proc means data=example;
   var numvar;
   run;

data reshape;
   set example;
   array r(*) row: ;
   do i=1 to dim(r);
      rowname = vname(r[i]);
      rowVal  = r[i];
      output;
   end;
   keep col rowname rowval numvar;
run;

/*N of variable numvar gets increased by three times after the reshape step:*/
proc means data=reshape;
   var numvar;
   run;

This for example results in wrong sums etc. for those numeric variables.

Is it possible to do something about it?

Thank you,

Curt

ballardw · Posted 04-21-2024 04:12 PM

@Curt wrote:

Hello ballardw,

thank you for your help. I just tried out your code and it displayed exactly what I wanted: a total count of all the non-missing values per column and a count of every specific icd-code within a column. I just needed to add “all“ behind “table rowval“ to get a total.

However, I found out that the reshape step has an influence on numeric variables written in the keep statement by increasing their N:
data example;
   input col $ row1 $ row2 $ row3 $ numvar;
datalines;
A   A1  .  C5 1
A   B4  C5 D6 2
A   .   A1 B4 3
B   .   A2 B4 4
B   .   C5 A1 5
;

proc means data=example;
   var numvar;
   run;

data reshape;
   set example;
   array r(*) row: ;
   do i=1 to dim(r);
      rowname = vname(r[i]);
      rowVal  = r[i];
      output;
   end;
   keep col rowname rowval numvar;
run;

/*N of variable numvar gets increased by three times after the reshape step:*/
proc means data=reshape;
   var numvar;
   run;
This for example results in wrong sums etc. for those numeric variables.

Is it possible to do something about it?

Thank you,

Curt

What "numeric variables". Your discussion only mentions class variables in

class columnvar rowvar1-rowvar10;

Did you read the part of my response that says?

Note that if there are other variables that need to be used in this report then you would need to keep them in the reshape step but exactly how may have some details as nothing about them has been discussed.

Your comment sounds a whole lot like there are other variables that you did not mention at all in your question. So an example of something that looks like Your data is needed.

The reshape, if N is affected, would require only outputting the values once per grouping variable combination. Which would mean set them missing after the first the first OUTPUT statement in the Do loop.

Curt

Hello ballardw,

What "numeric variables". Your discussion only mentions class variables

Your comment sounds a whole lot like there are other variables that you did not mention at all in your question

Yes that's true. The reason for this was that the ICD variables were the only ones I wanted to have reshaped. There are other variables that I would like to include in the table, but I didn't know that their N would be affected by the reshaping of the ICD variables; that is why I didn't mention them before.

Did you read the part of my response that says?
Note that if there are other variables that need to be used in this report then you would need to keep them in the reshape step but exactly how may have some details as nothing about them has been discussed.

Yes, I read it, but I misunderstood it. I thought you just meant that I need to include all the other variables that I need in addition to that in the keep statement of the reshape step.

Other variables which I also want to display in the table are the numerical one "years_spent" and the categorical one "gender".

I would like a table that looks like this, but with an unaffected N for "years_spent" and "gender":

data have;
   input exact_location $ icdcategory_1 $ icdcategory_2 $ icdcategory_3 $ gender $ years_spent;
datalines;
A   A1  .  C5 m 20
A   B4  C5 D6 f 20
A   .   A1 B4 m 20
B   .   A2 B4 m .
C   .   C5 A1 . 40
;
data reshape;
   set have;
   array r(*) icdcategory: ;
   do i=1 to dim(r);
      icdcatname = vname(r[i]);
      icdcatvalues  = r[i];
      output;
   end;
   keep years_spent gender exact_location icdcatname icdcatvalues;
run;

proc tabulate data=reshape;
   class exact_location gender icdcatvalues;
   var years_spent; 
   table gender years_spent icdcatvalues all,
         exact_location;
run;

Cynthia_sas · Posted 04-13-2024 11:09 AM

Hi, Same suggestion as @ballardw -- you can make fake data to test with the posted code. My sense from your description is that your data structure may not work without restructuring.
Cynthia

mkeintz

The problem is that what you want is to double count in the ALL row (actually "up to ten" count), for which tabulate does not provide a mechanism.

I suggest a data set VIEW with a new variable, to be included in the proc tabulate.


data vneed / view=vneed;
  set have (keep=columnvar rowvar1-rowvar10);
  row_count=n(of rowvar:);
run;

proc tabulate data=vneed;
  class columnvar rowvar1-rowvar2;
  var row_count;
  table  rowvar1-rowvar2
         row_count*sum='Col Total'
         ,columnvar;
run;

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------

PROC TABULATE: totals across multiple class variables

Re: PROC TABULATE: totals across multiple class variables

Re: PROC TABULATE: totals across multiple class variables

Re: PROC TABULATE: totals across multiple class variables

Re: PROC TABULATE: totals across multiple class variables

Re: PROC TABULATE: totals across multiple class variables

Re: PROC TABULATE: totals across multiple class variables

Re: PROC TABULATE: totals across multiple class variables

Re: PROC TABULATE: totals across multiple class variables

Re: PROC TABULATE: totals across multiple class variables

Classroom Training Available!