Solved: Re: PROC TABULATE: totals across multiple class variables

Curt · Posted 04-10-2024 11:52 AM

Hi everyone,

I would like to make a table which displays a total of all non-missing values across multiple class row variables, for each level of the column variable.

My idea was this code:

PROC TABULATE data=have missing;
class columnvar rowvar1-rowvar10;
table rowvar1-rowvar10 all, columnvar;
run;

but unfortunately it doesn't work. Instead of displaying the total of all non-missing values across all the row variables in the ALL variable, it just provides a total of the last rowvariable 10.

Many thanks in advance,

Curt

ballardw · Posted 04-13-2024 11:08 AM

Data does not have to be exactly your data, just something that behaves in the same way.

And if the data you share only included ICD codes and the column variable replace the "exact location" with something like A, B, and C .

It wouldn't even have to have the same number of variables, 3 of your "row" variables would likely be sufficient.

However it would have to have enough values so that you can indicate what the desired "total" looks like.

It also is not at all clear what "a total of all non-missing values" actually means. Is this a count non-missing, or do you want to display each actual value (ICD code) and its count?

Here is one way to get one sort of count but the reshape step to work with all rows is likely going to be common:

data example;
   input col $ row1 $ row2 $ row3 $;
datalines;
A   A1  .  C5
A   B4  C5 D6
A   .   A1 B4
B   .   A2 B4
B   .   C5 A1
;
data reshape;
   set example;
   array r(*) row: ;
   do i=1 to dim(r);
      rowname = vname(r[i]);
      rowVal  = r[i];
      output;
   end;
   keep col rowname rowval;
run;


proc tabulate data=reshape;
   class rowval col;
   table rowval,
         col
   ;
run;

Note that if there are other variables that need to be used in this report then you would need to keep them in the reshape step but exactly how may have some details as nothing about them has been discussed.

View solution in original post

PaigeMiller · Posted 04-10-2024 11:58 AM

Show us a portion of the data, as working SAS data step code (examples and instructions).

Show us a screen capture of the output, using the "Insert Photos" icon to include the screen capture; do not attach files.

--
Paige Miller

ballardw · Posted 04-10-2024 01:47 PM

@Curt wrote:

Hi everyone,

I would like to make a table which displays a total of all non-missing values across multiple class row variables, for each level of the column variable.

My idea was this code:

PROC TABULATE data=have missing;
class columnvar rowvar1-rowvar10;
table rowvar1-rowvar10 all, columnvar;
run;

but unfortunately it doesn't work. Instead of displaying the total of all non-missing values across all the row variables in the ALL variable, it just provides a total of the last rowvariable 10.

Many thanks in advance,

Curt

This statement you made is technically incorrect: it just provides a total of the last rowvariable 10.

All, used the way you have, summarized the entire table. You can check that by running the following. The ALL row will show the same result.

PROC TABULATE data=have missing;
class columnvar rowvar1-rowvar10;
table  all, columnvar;
run;

Proc tabulate in general does not combine multiple class variables for any purpose as you are contemplating.

Without example data and expected output for that given data I won't suggest a specific solution but it may mean restructuring your data and/or adding variable(s) to use as VAR variables.

Curt · Posted 04-13-2024 08:45 AM

Hi ballardw,

thank you for your answer.

Unfortunately, I can't show the data because it is confidential.

However, what I can say is that all the row variables describe diagnoses (ICD-Codes) and the column variable describes the exact location from where those diagnoses are from.

So if proc tabulate doesn't work in this case, is there another possibility to display the data in the way I described?

Thank you,

Curt

ballardw · Posted 04-13-2024 11:08 AM

Data does not have to be exactly your data, just something that behaves in the same way.

And if the data you share only included ICD codes and the column variable replace the "exact location" with something like A, B, and C .

It wouldn't even have to have the same number of variables, 3 of your "row" variables would likely be sufficient.

However it would have to have enough values so that you can indicate what the desired "total" looks like.

It also is not at all clear what "a total of all non-missing values" actually means. Is this a count non-missing, or do you want to display each actual value (ICD code) and its count?

Here is one way to get one sort of count but the reshape step to work with all rows is likely going to be common:

data example;
   input col $ row1 $ row2 $ row3 $;
datalines;
A   A1  .  C5
A   B4  C5 D6
A   .   A1 B4
B   .   A2 B4
B   .   C5 A1
;
data reshape;
   set example;
   array r(*) row: ;
   do i=1 to dim(r);
      rowname = vname(r[i]);
      rowVal  = r[i];
      output;
   end;
   keep col rowname rowval;
run;


proc tabulate data=reshape;
   class rowval col;
   table rowval,
         col
   ;
run;

Note that if there are other variables that need to be used in this report then you would need to keep them in the reshape step but exactly how may have some details as nothing about them has been discussed.

Curt · Posted 04-21-2024 08:54 AM

Hello ballardw,

thank you for your help. I just tried out your code and it displayed exactly what I wanted: a total count of all the non-missing values per column and a count of every specific icd-code within a column. I just needed to add “all“ behind “table rowval“ to get a total.

However, I found out that the reshape step has an influence on numeric variables written in the keep statement by increasing their N:

data example;
   input col $ row1 $ row2 $ row3 $ numvar;
datalines;
A   A1  .  C5 1
A   B4  C5 D6 2
A   .   A1 B4 3
B   .   A2 B4 4
B   .   C5 A1 5
;

proc means data=example;
   var numvar;
   run;

data reshape;
   set example;
   array r(*) row: ;
   do i=1 to dim(r);
      rowname = vname(r[i]);
      rowVal  = r[i];
      output;
   end;
   keep col rowname rowval numvar;
run;

/*N of variable numvar gets increased by three times after the reshape step:*/
proc means data=reshape;
   var numvar;
   run;

This for example results in wrong sums etc. for those numeric variables.

Is it possible to do something about it?

Thank you,

Curt

ballardw · Posted 04-21-2024 04:12 PM

@Curt wrote:

Hello ballardw,

thank you for your help. I just tried out your code and it displayed exactly what I wanted: a total count of all the non-missing values per column and a count of every specific icd-code within a column. I just needed to add “all“ behind “table rowval“ to get a total.

However, I found out that the reshape step has an influence on numeric variables written in the keep statement by increasing their N:
data example;
   input col $ row1 $ row2 $ row3 $ numvar;
datalines;
A   A1  .  C5 1
A   B4  C5 D6 2
A   .   A1 B4 3
B   .   A2 B4 4
B   .   C5 A1 5
;

proc means data=example;
   var numvar;
   run;

data reshape;
   set example;
   array r(*) row: ;
   do i=1 to dim(r);
      rowname = vname(r[i]);
      rowVal  = r[i];
      output;
   end;
   keep col rowname rowval numvar;
run;

/*N of variable numvar gets increased by three times after the reshape step:*/
proc means data=reshape;
   var numvar;
   run;
This for example results in wrong sums etc. for those numeric variables.

Is it possible to do something about it?

Thank you,

Curt

What "numeric variables". Your discussion only mentions class variables in

class columnvar rowvar1-rowvar10;

Did you read the part of my response that says?

Note that if there are other variables that need to be used in this report then you would need to keep them in the reshape step but exactly how may have some details as nothing about them has been discussed.

Your comment sounds a whole lot like there are other variables that you did not mention at all in your question. So an example of something that looks like Your data is needed.

The reshape, if N is affected, would require only outputting the values once per grouping variable combination. Which would mean set them missing after the first the first OUTPUT statement in the Do loop.

Curt · Posted 05-04-2024 05:54 PM

Hello ballardw,

What "numeric variables". Your discussion only mentions class variables

Your comment sounds a whole lot like there are other variables that you did not mention at all in your question

Yes that's true. The reason for this was that the ICD variables were the only ones I wanted to have reshaped. There are other variables that I would like to include in the table, but I didn't know that their N would be affected by the reshaping of the ICD variables; that is why I didn't mention them before.

Did you read the part of my response that says?
Note that if there are other variables that need to be used in this report then you would need to keep them in the reshape step but exactly how may have some details as nothing about them has been discussed.

Yes, I read it, but I misunderstood it. I thought you just meant that I need to include all the other variables that I need in addition to that in the keep statement of the reshape step.

Other variables which I also want to display in the table are the numerical one "years_spent" and the categorical one "gender".

I would like a table that looks like this, but with an unaffected N for "years_spent" and "gender":

data have;
   input exact_location $ icdcategory_1 $ icdcategory_2 $ icdcategory_3 $ gender $ years_spent;
datalines;
A   A1  .  C5 m 20
A   B4  C5 D6 f 20
A   .   A1 B4 m 20
B   .   A2 B4 m .
C   .   C5 A1 . 40
;
data reshape;
   set have;
   array r(*) icdcategory: ;
   do i=1 to dim(r);
      icdcatname = vname(r[i]);
      icdcatvalues  = r[i];
      output;
   end;
   keep years_spent gender exact_location icdcatname icdcatvalues;
run;

proc tabulate data=reshape;
   class exact_location gender icdcatvalues;
   var years_spent; 
   table gender years_spent icdcatvalues all,
         exact_location;
run;

ballardw · Posted 06-06-2024 12:37 PM

@Curt wrote:

I would like a table that looks like this, but with an unaffected N for "years_spent" and "gender":

I don't know what "unaffected N for "years_spent" and "gender": means.

IF you are going to make an attempt to use proc tabulate you have to have a very clear understanding of what a Class and Var variable are. If you want a "count" of class variable if repeated the count will be multiplied.

If the class variable is not on an output observation then by default the observation is excluded. WHICH is why I said the bit about any other variables.

So your addition of a character variable Gender pretty much means that PROC Tabulate is not going to work for the report you want.

Speaking of which, you have yet to actually show an example of what the report you want looks like for a given example data set.

Curt · Posted 06-08-2024 10:35 AM

Hello ballardw,

the report which I would like to have looks exactly like the table generated on the example data in my previous post. The only problem is that the numbers produced for N are wrong.

What I mean by an affected N is that the total N for "gender" should be 4, as there are only four entries for gender in the example data (3 m and 1 f) , but after performing the reshape step, N for "gender" at location A alone totals already 7 (5 m, 3 f).

Same is true for the numerical variable "years_spent" wich also gets increased after the reshape step.

So your addition of a character variable Gender pretty much means that PROC Tabulate is not going to work for the report you want.

So if it is not possible to create a table like this with proc tabulate, is there an other method or is it not possible to do this with SAS in general?

Many thanks,

Curt

ballardw · Posted 06-09-2024 02:45 PM

@Curt wrote:

So your addition of a character variable Gender pretty much means that PROC Tabulate is not going to work for the report you want.

So if it is not possible to create a table like this with proc tabulate, is there an other method or is it not possible to do this with SAS in general?

Many thanks,

Curt

No. I just means a different approach is needed.

It may mean summarizing data, possibly several times depending on what is wanted, and recombining the data.

Look at the Report Writing Interface documentation in the data step. You can create cells/rows/columns that merge/combine/span that are not possible in Proc Tabulate or Report but then you are responsible for creating the data.

I do not consider a statement like : " I would like to have looks exactly like the table generated on the example data in my previous post. The only problem is that the numbers produced for N are wrong." a clear definition of what is wanted for a report table appearance. As a minimum I can't tell which post. Or which data, since you have been very reluctant to provide any.

Curt · Posted 06-12-2024 10:26 AM

Hello ballardw,

many thanks for your answer.

As you recommended, I will take a look at the Report Writing Interface documentation.

I do not consider a statement like : " I would like to have looks exactly like the table generated on the example data in my previous post. The only problem is that the numbers produced for N are wrong." a clear definition of what is wanted for a report table appearance. As a minimum I can't tell which post. Or which data, since you have been very reluctant to provide any.

The example data/table I was referring to is this one:

data have;
   input exact_location $ icdcategory_1 $ icdcategory_2 $ icdcategory_3 $ gender $ years_spent;
datalines;
A   A1  .  C5 m 20
A   B4  C5 D6 f 20
A   .   A1 B4 m 20
B   .   A2 B4 m .
C   .   C5 A1 . 40
;
data reshape;
   set have;
   array r(*) icdcategory: ;
   do i=1 to dim(r);
      icdcatname = vname(r[i]);
      icdcatvalues  = r[i];
      output;
   end;
   keep years_spent gender exact_location icdcatname icdcatvalues;
run;

proc tabulate data=reshape;
   class exact_location gender icdcatvalues;
   var years_spent; 
   table years_spent gender all icdcatvalues all,
         exact_location;
run;

Tom · Posted 06-12-2024 10:50 AM

Why did you exclude the cases with missing CLASS variables?

Include the MISSING option on your CLASS statement.

class exact_location gender icdcatvalues / missing;

Result

Curt · Posted 06-13-2024 09:36 AM

Hello Tom,

thanks for your suggestion.

The problem with the missing option is that the count for “all N” then also includes the missing values. But I need a count of all non-missing values.

Luckily, my (real) data doesn't have any missing values for "gender", so the column for C will not be excluded from the table.

But is there a way to include the missing option and still only count the non-missing values?

Cynthia_sas · Posted 04-13-2024 11:09 AM

Hi, Same suggestion as @ballardw -- you can make fake data to test with the posted code. My sense from your description is that your data structure may not work without restructuring.
Cynthia

Registration is open

SAS Training: Just a Click Away