Solved: Re: Formatted frequency Table

rajd1 · Posted 06-02-2020 03:04 PM

Hi all,

My problem is relatively simple but i am confused as to what procedure to use. I tried proc freq and proc tabulate but could not accomplish it

Here is my data set:

A	B	C	D
1	0	1	0
1	0	1	1
1	0	1	0
0	0	1	0
0	1	0	1

This is the table i need:

Variable	N(0)	0%	N(1)	1%
A
B
C
D

Here is my code:

proc tabulate data= gg NOSEPS format=comma11.2;
class A B C D / missing;
table (A B C D), (all pctn)/rts=25;
keylabel
pctn = '%';
run;

thanks!!

mkeintz · Posted 06-02-2020 04:06 PM

I presume in your percent columns you want row percentages - i.e. percentages in each row sum to 100%.

I suspect this can be done in proc report, but to do it in proc tabulate requires creating an intermediate data set, You want a column for zeroes, and for ones. across A B C and D. So basically you need two new variables ZERO and ONE which you can cross against their occurences in A, B, C, and D.

data have;
  input A B C D;
datalines;
1 0 1 0
1 0 1 1
1 0 1 0
0 0 1 0
0 1 0 1
run;

data Vneed (keep=vname zero one total) / VIEW=VNEED;    
  set have;
  array var  a b c d;
  do over var;
    vname=vname(var);
    call missing(zero,one);     /*Correction added */
    if var=0 then zero=1; else
    if var=1 then one=1;
    total=1;
    output;
  end;
run;
proc tabulate data=Vneed missing ;
  class vname ;
  var zero one total;
  tables vname=' ', (zero='0' one='1')*(sum='N'*f=3. pctsum<total>='%'*f=5.1) /rts=10;
run;

This create 4 records for each obs in HAVE. It also creates a new variable TOTAL. I did that to use it in the PCTSUM operator in proc tabulate. For each variables A through D, TOTAL sums to the number of rows in the original data set, which is what I need in the denominator of each requested percentage.

Other notes, all concerning appearance and formatting of the table.

the vname=' ', instead of vname, is just a way to take the label "VNAME" away from the row title box.
the "/rts=10" tell tabulate to give only 10 columns to the row title.
the zero='0' and one='1' and sum='N' and pctsum<…>='%' just puts desired text in place of ZERO, ONE, SUM, and PCTSUM in the table.
the "f=" component dictate the format of the corresponding cells

Edit note: I added the CALL MISSING to make sure an observation was not double counted (i.e. both ZERO and ONE =1).

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------

View solution in original post

PaigeMiller · Posted 06-02-2020 03:41 PM

Please explain the four different columns in the output table.

--
Paige Miller

rajd1 · Posted 06-02-2020 03:44 PM

Sorry, it is just Frequency and percentage of 0's and 1's by each variable.

ballardw · Posted 06-02-2020 04:14 PM

To get the columns you show you need another variable and a different data structure for Tabulate to work with.

data have;
 input A	B	C	D;
datalines;
1	0	1	0
1	0	1	1
1	0	1	0
0	0	1	0
0	1	0	1
;

data need;
  set have;
  array v A B C D;
  do i= 1 to dim(v);
      name = vname(v[i]);
      value=v[i];
      output;
  end;
  keep name value;
run;

proc tabulate data=need;
   class name value;
   table name=' ',
         value=' '*(n pctn='%')
         /box='Name'
   ;
run;

Is as close as I can get.

Other approaches depending on how the column labels must appear would involve summarizing the data, probably the Need data set and then further reshaping.

rajd1 · Posted 06-02-2020 04:16 PM

This is awesome! Thanks so much!

rajd1 · Posted 06-02-2020 07:45 PM

Here is the SAS output: The counts are good but the percentages are off. For example the category 0 for variable A should be 40% and category 1 should be 50%.

Name	0	1
	N	%	N	%
A	2	10	3	15
B	4	20	1	5
C	1	5	4	20
D	3	15	2	10

mkeintz · Posted 06-02-2020 04:06 PM

I presume in your percent columns you want row percentages - i.e. percentages in each row sum to 100%.

I suspect this can be done in proc report, but to do it in proc tabulate requires creating an intermediate data set, You want a column for zeroes, and for ones. across A B C and D. So basically you need two new variables ZERO and ONE which you can cross against their occurences in A, B, C, and D.

data have;
  input A B C D;
datalines;
1 0 1 0
1 0 1 1
1 0 1 0
0 0 1 0
0 1 0 1
run;

data Vneed (keep=vname zero one total) / VIEW=VNEED;    
  set have;
  array var  a b c d;
  do over var;
    vname=vname(var);
    call missing(zero,one);     /*Correction added */
    if var=0 then zero=1; else
    if var=1 then one=1;
    total=1;
    output;
  end;
run;
proc tabulate data=Vneed missing ;
  class vname ;
  var zero one total;
  tables vname=' ', (zero='0' one='1')*(sum='N'*f=3. pctsum<total>='%'*f=5.1) /rts=10;
run;

This create 4 records for each obs in HAVE. It also creates a new variable TOTAL. I did that to use it in the PCTSUM operator in proc tabulate. For each variables A through D, TOTAL sums to the number of rows in the original data set, which is what I need in the denominator of each requested percentage.

Other notes, all concerning appearance and formatting of the table.

the vname=' ', instead of vname, is just a way to take the label "VNAME" away from the row title box.
the "/rts=10" tell tabulate to give only 10 columns to the row title.
the zero='0' and one='1' and sum='N' and pctsum<…>='%' just puts desired text in place of ZERO, ONE, SUM, and PCTSUM in the table.
the "f=" component dictate the format of the corresponding cells

Edit note: I added the CALL MISSING to make sure an observation was not double counted (i.e. both ZERO and ONE =1).

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------

rajd1 · Posted 06-02-2020 04:15 PM

This is a great solution. THANK YOU! Just seems like PROC TABULATE might use one extra step to get to the solution. I have been using proc tabulate quite a bit lately.

ballardw · Posted 06-02-2020 07:47 PM

@rajd1 wrote:
This is a great solution. THANK YOU! Just seems like PROC TABULATE might use one extra step to get to the solution. I have been using proc tabulate quite a bit lately.

Depends on the data whether you need to reshape things.

I create lots of 0/1 coded variables but I do no report on the 0 and 1 counts. All of the information is contained in the N statistic (number of values), the Sum (is the number of 1's ) and the Mean( is the percentage of 1's). If someone needs the number of 0's then is N - the Sum, if you need the percent of 0's then it is 1- mean.

Here's an example using a modified data set to have some missing values to see what things might look like:

data have;
 input A	B	C	D;
datalines;
1	0	1	0
1	0	1	1
1	0	1	0
0	0	1	0
0	1	0	1
.  .  1  0
1  .  1  0
;

proc tabulate data=have;
   var A B C D;
   table A B C D,
         n='# Values' sum='# 1' mean='%  1'*f=percent8.1
         /box='Variable'
   ;
run;

rajd1 · Posted 06-02-2020 07:40 PM

I went back and ran this and noticed that the count numbers are off. The counts and percentages for variables B C and D are off. Below is the SAS output:


	N	%	N	%
A	2	10	3	15
B	4	20	1	5
C	1	5	4	20
D	3	15	2	10

mkeintz · Posted 06-02-2020 08:32 PM

I corrected the counts.

But, I kept the percentages as percentage of row totals, not grand totals. Is that what you want?

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------

rajd1 · Posted 06-02-2020 08:45 PM

Yes, that works. I did need the row totals. Thanks so much. I am applying this for 100's of variables. Not sure if this the most efficient way but it works.

ballardw · Posted 06-03-2020 07:12 PM

@rajd1 wrote:

Yes, that works. I did need the row totals. Thanks so much. I am applying this for 100's of variables. Not sure if this the most efficient way but it works.

If you have 100's of such variables then I strongly suggest you look at my code that doesn't do separate columns for 0/1 but uses summary statistics. Why, you may ask? You can easily use a variable in the code. You haven't really provided any realistic example of your data but any of the list types in SAS will work both on the VAR statement or the the Table statement. Such as

Var a1-a100 ;

or

Var a: b: c: ; to use all of the variables starting with a, b or c.

If any of your variables have missing values I suggest you test @mkeintz's solution very carefully.

Consider this data set:

data have;
  input A B C D;
datalines;
1 0 1 0
1 0 1 1
1 0 1 0
. 0 1 0
. 1 0 1
run;

used with that accepted solution. The Row percentages are going to look odd at best. As will @Ksharp's though in a slightly different manner. Of course if you have NO missing values for any of the variables you may be okay.

rajd1 · Posted 06-03-2020 07:17 PM

That makes perfect sense! I will be using this code as well. Thanks a ton!

Ksharp · Posted 06-03-2020 07:36 AM

Make a macro to wrap the following code and go through all the variable name.

data have;
  input A B C D;
datalines;
1 0 1 0
1 0 1 1
1 0 1 0
0 0 1 0
0 1 0 1
;
proc sql;
select 'A' as variable,
       sum(A=0) as zero label='N(0)',100*calculated zero/count(*) as z_per label='0 %',
       sum(A=1) as one label='N(1)',100*calculated one/count(*) as z_per label='1 %'
   from have
union 
select 'B' as variable,
       sum(B=0) as zero label='N(0)',100*calculated zero/count(*) as z_per label='0 %',
       sum(B=1) as one label='N(1)',100*calculated one/count(*) as z_per label='1 %'
   from have
union 
select 'C' as variable,
       sum(C=0) as zero label='N(0)',100*calculated zero/count(*) as z_per label='0 %',
       sum(C=1) as one label='N(1)',100*calculated one/count(*) as z_per label='1 %'
   from have
union
select 'D' as variable,
       sum(D=0) as zero label='N(0)',100*calculated zero/count(*) as z_per label='0 %',
       sum(D=1) as one label='N(1)',100*calculated one/count(*) as z_per label='1 %'
   from have
;
quit;

Ready to join fellow brilliant minds for the SAS Hackathon?

Classroom Training Available!