Solved: Display Value Frequencies with PROC TABULATE

arielme · Posted 07-15-2020 05:56 PM

I'm trying to write SAS code that outputs grouped frequency tables for a number of related variables. I have about 70 variables in about 7 groups and each group should have its own table. These vars are named for their group and number, so group one is f1q1 to f1q10, group two is f2q1 to f2q8, and so on. Each var has numeric values 1-5, labeled Strongly Disagree, Disagree, etc.

I would like the tables to look like the following, with var labels in the leftmost rows and values as columns, including hiding responses with n<=6, if possible (though this last step could be done manually after table export):

I'm using the colon operator and proc tabulate (rather than proc freq) to try to speed up this process.

proc tabulate data = survey;
	class f1q: ;
	table (f1q:), (n pctn = "&") ; 
run;

As you can see below, this creates the labels and counts successfully but puts both as rows. I would like the values (Agree, Disagree, etc) as columns each with their own N and % (there is a typo in the screenshot and & should be %).

Is this possible with proc tabulate? Is there another procedure that would let me automate this process more easily? I have a number of surveys, each with many question groups, that I need to process this way. Thanks!

ballardw · Posted 07-15-2020 07:57 PM

With multiple variables to be displayed in the same row/column in either Proc Tabulate Report you need to reshape the data a bit so that the name of the variable becomes a variable. Then you have the choice of either creating a custom format to display your "group" information or use the values to create a different variable to hold the group value. Then you have the value (1 to 5) or whatever in a single variable for the categorical processing.

However your specified table layout has some strikes against doing it with Proc tabulate easily. The first is getting a value like (n=xx) in the body. Not impossible but is going to be another custom format.

Second is you have two different statistics in a column. That requires having the statistic as part of the row row dimension and not show a statistic row label. Again, no problem until you come to:

Third you have another column that is a mean that would be crossed with the row value (group) and one of either the N or rowpctN statistic. And Tabulate will not allow crossing statistics quite that way. It would also require having a an additional variable with the value of the response.

And a last strike: your suppression of responses if less than 6 (or any number). I can force a VALUE of less than 6 to display as a blank but that single asterisk for TWO values (percent and n) isn't quite as easy because the percent value could be 100 percent when n=1 .

That suppression of n should be at a question or topic level, not individual cells, at least in surveys I work with. We don't want any values reported if the overall sample size or response group is too small. Or does that exclusion come from a different example?

This is the closest I could get with basic features of Proc Tabulate. Another format could assign the text for the Classval variable.

data example;
   input v1_1 v1_2 v1_3 v2_1 v2_2 v2_3 v2_4;
datalines;
1 1 2 2 1 . 4 
2 3 2 3 2 3 2
5 5 5 5 5 5 5
1 1 1 1 1 1 1
;

data table;
   set example;
   array vv v1: v2: ;
   length name $ 10;
   do i=1 to dim(vv);
      name = vname(vv[i]);
      classval= vv[i];
      varval  = vv[i];
      output;
   end;
   keep name classval varval;
run;

proc format;
value $namegrp
'v1_1', 'v1_2', 'v1_3'         = 'Group 1'
'v2_1', 'v2_2', 'v2_3', 'v2_4' = 'Group 2'
;
run;

proc tabulate data=table;
   class name classval;
   format name $namegrp.;
   var varval ;
   table name=' ' ,
         classval=' '*(rowpctn = '%' n= 'n') varval='mean response'*mean=' '
         
         /misstext=' '
   ;
run;

Some games can be played with style elements to remove selected cell borders between % and N. Coupled with use of horizontal justification you plus that picture format for n you could something that looks like

43.3 (n=17) | 16.4 (n=5) | <other pairs of values>| 3.33 <mean at the end>

or no vertical cell boundaries at all.

For education try moving the classvar to the row dimension to get the % over n appearance:

proc tabulate data=table;
   class name classval;
   format name $namegrp.;
   var varval ;
   table name=' '  *(rowpctn = '%' n= 'n'),
         classval=' ' varval='mean response'*mean=' '
         
         /misstext=' '
   ;
run;

which will have errors. Then remove the " varval =" to the end of the line to get stacked percent and n.

The other options get into pre-calculatiing everthing, making character values and such.

Or search this forum for a "demographic report" pdf which may have other approaches.

View solution in original post

ballardw · Posted 07-15-2020 07:57 PM

With multiple variables to be displayed in the same row/column in either Proc Tabulate Report you need to reshape the data a bit so that the name of the variable becomes a variable. Then you have the choice of either creating a custom format to display your "group" information or use the values to create a different variable to hold the group value. Then you have the value (1 to 5) or whatever in a single variable for the categorical processing.

However your specified table layout has some strikes against doing it with Proc tabulate easily. The first is getting a value like (n=xx) in the body. Not impossible but is going to be another custom format.

Second is you have two different statistics in a column. That requires having the statistic as part of the row row dimension and not show a statistic row label. Again, no problem until you come to:

Third you have another column that is a mean that would be crossed with the row value (group) and one of either the N or rowpctN statistic. And Tabulate will not allow crossing statistics quite that way. It would also require having a an additional variable with the value of the response.

And a last strike: your suppression of responses if less than 6 (or any number). I can force a VALUE of less than 6 to display as a blank but that single asterisk for TWO values (percent and n) isn't quite as easy because the percent value could be 100 percent when n=1 .

That suppression of n should be at a question or topic level, not individual cells, at least in surveys I work with. We don't want any values reported if the overall sample size or response group is too small. Or does that exclusion come from a different example?

This is the closest I could get with basic features of Proc Tabulate. Another format could assign the text for the Classval variable.

data example;
   input v1_1 v1_2 v1_3 v2_1 v2_2 v2_3 v2_4;
datalines;
1 1 2 2 1 . 4 
2 3 2 3 2 3 2
5 5 5 5 5 5 5
1 1 1 1 1 1 1
;

data table;
   set example;
   array vv v1: v2: ;
   length name $ 10;
   do i=1 to dim(vv);
      name = vname(vv[i]);
      classval= vv[i];
      varval  = vv[i];
      output;
   end;
   keep name classval varval;
run;

proc format;
value $namegrp
'v1_1', 'v1_2', 'v1_3'         = 'Group 1'
'v2_1', 'v2_2', 'v2_3', 'v2_4' = 'Group 2'
;
run;

proc tabulate data=table;
   class name classval;
   format name $namegrp.;
   var varval ;
   table name=' ' ,
         classval=' '*(rowpctn = '%' n= 'n') varval='mean response'*mean=' '
         
         /misstext=' '
   ;
run;

Some games can be played with style elements to remove selected cell borders between % and N. Coupled with use of horizontal justification you plus that picture format for n you could something that looks like

43.3 (n=17) | 16.4 (n=5) | <other pairs of values>| 3.33 <mean at the end>

or no vertical cell boundaries at all.

For education try moving the classvar to the row dimension to get the % over n appearance:

proc tabulate data=table;
   class name classval;
   format name $namegrp.;
   var varval ;
   table name=' '  *(rowpctn = '%' n= 'n'),
         classval=' ' varval='mean response'*mean=' '
         
         /misstext=' '
   ;
run;

which will have errors. Then remove the " varval =" to the end of the line to get stacked percent and n.

The other options get into pre-calculatiing everthing, making character values and such.

Or search this forum for a "demographic report" pdf which may have other approaches.

Display Value Frequencies with PROC TABULATE

Re: Display Value Frequencies with PROC TABULATE

Re: Display Value Frequencies with PROC TABULATE

Display Value Frequencies with PROC TABULATE

Re: Display Value Frequencies with PROC TABULATE

Re: Display Value Frequencies with PROC TABULATE

SAS Innovate 2025: Call for Content

Click image to register for webinar

Classroom Training Available!