BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
rajd1
Quartz | Level 8

Hi all,

My problem is relatively simple but i am confused as to what procedure to use. I tried proc freq and proc tabulate but could not accomplish it

Here is my data set:

ABCD
1010
1011
1010
0010
0101

This is the table i need:

VariableN(0)0%N(1)1%
A    
B    
C    
D    

 

Here is my code:

proc tabulate data= gg NOSEPS format=comma11.2;
class A B C D / missing;
table (A B C D), (all pctn)/rts=25;
keylabel
pctn = '%';
run;

 

thanks!!

1 ACCEPTED SOLUTION

Accepted Solutions
mkeintz
PROC Star

I presume in your percent columns you want row percentages - i.e. percentages in each row sum to 100%.

 

I suspect this can be done in proc report, but to do it in proc tabulate requires creating an intermediate data set,   You want a column for zeroes, and for ones.  across A B C and D.   So basically you need two new variables ZERO and ONE which you can cross against their occurences in A, B, C, and D.

 

data have;
  input A B C D;
datalines;
1 0 1 0
1 0 1 1
1 0 1 0
0 0 1 0
0 1 0 1
run;

data Vneed (keep=vname zero one total) / VIEW=VNEED;    
  set have;
  array var  a b c d;
  do over var;
    vname=vname(var);
    call missing(zero,one);     /*Correction added */
    if var=0 then zero=1; else
    if var=1 then one=1;
    total=1;
    output;
  end;
run;
proc tabulate data=Vneed missing ;
  class vname ;
  var zero one total;
  tables vname=' ', (zero='0' one='1')*(sum='N'*f=3. pctsum<total>='%'*f=5.1) /rts=10;
run;

This create 4 records for each obs in HAVE.   It also creates a new variable TOTAL.   I did that to use it in the PCTSUM operator in proc tabulate.   For each variables A through D, TOTAL sums to the number of rows in the original data set, which is what I need in the denominator of each requested percentage.

 

Other notes, all concerning appearance and formatting of the table.

  1. the vname=' ', instead of vname, is just a way to take the label "VNAME" away from the row title box.
  2. the "/rts=10" tell tabulate to give only 10 columns to the row title.
  3. the zero='0' and one='1'  and sum='N' and pctsum<…>='%' just puts desired text in place of ZERO, ONE, SUM, and PCTSUM in the table.
  4. the "f=" component dictate the format of the corresponding cells

 

Edit note:  I added the CALL MISSING to make sure an observation was not double counted (i.e. both ZERO and ONE =1).

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------

View solution in original post

14 REPLIES 14
PaigeMiller
Diamond | Level 26

Please explain the four different columns in the output table.

--
Paige Miller
rajd1
Quartz | Level 8
Sorry, it is just Frequency and percentage of 0's and 1's by each variable.
ballardw
Super User

To get the columns you show you need another variable and a different data structure for Tabulate to work with.

 

data have;
 input A	B	C	D;
datalines;
1	0	1	0
1	0	1	1
1	0	1	0
0	0	1	0
0	1	0	1
;

data need;
  set have;
  array v A B C D;
  do i= 1 to dim(v);
      name = vname(v[i]);
      value=v[i];
      output;
  end;
  keep name value;
run;

proc tabulate data=need;
   class name value;
   table name=' ',
         value=' '*(n pctn='%')
         /box='Name'
   ;
run;

Is as close as I can get.

Other approaches depending on how the column labels must appear would involve summarizing the data, probably the Need data set and then further reshaping.

rajd1
Quartz | Level 8
This is awesome! Thanks so much!
rajd1
Quartz | Level 8

Here is the SAS output: The counts are good but the percentages are off. For example the category 0 for variable A should be 40% and category 1 should be 50%.


Name01
 N%N%
A210315
B42015
C15420
D315210
mkeintz
PROC Star

I presume in your percent columns you want row percentages - i.e. percentages in each row sum to 100%.

 

I suspect this can be done in proc report, but to do it in proc tabulate requires creating an intermediate data set,   You want a column for zeroes, and for ones.  across A B C and D.   So basically you need two new variables ZERO and ONE which you can cross against their occurences in A, B, C, and D.

 

data have;
  input A B C D;
datalines;
1 0 1 0
1 0 1 1
1 0 1 0
0 0 1 0
0 1 0 1
run;

data Vneed (keep=vname zero one total) / VIEW=VNEED;    
  set have;
  array var  a b c d;
  do over var;
    vname=vname(var);
    call missing(zero,one);     /*Correction added */
    if var=0 then zero=1; else
    if var=1 then one=1;
    total=1;
    output;
  end;
run;
proc tabulate data=Vneed missing ;
  class vname ;
  var zero one total;
  tables vname=' ', (zero='0' one='1')*(sum='N'*f=3. pctsum<total>='%'*f=5.1) /rts=10;
run;

This create 4 records for each obs in HAVE.   It also creates a new variable TOTAL.   I did that to use it in the PCTSUM operator in proc tabulate.   For each variables A through D, TOTAL sums to the number of rows in the original data set, which is what I need in the denominator of each requested percentage.

 

Other notes, all concerning appearance and formatting of the table.

  1. the vname=' ', instead of vname, is just a way to take the label "VNAME" away from the row title box.
  2. the "/rts=10" tell tabulate to give only 10 columns to the row title.
  3. the zero='0' and one='1'  and sum='N' and pctsum<…>='%' just puts desired text in place of ZERO, ONE, SUM, and PCTSUM in the table.
  4. the "f=" component dictate the format of the corresponding cells

 

Edit note:  I added the CALL MISSING to make sure an observation was not double counted (i.e. both ZERO and ONE =1).

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------
rajd1
Quartz | Level 8
This is a great solution. THANK YOU! Just seems like PROC TABULATE might use one extra step to get to the solution. I have been using proc tabulate quite a bit lately.
ballardw
Super User

@rajd1 wrote:
This is a great solution. THANK YOU! Just seems like PROC TABULATE might use one extra step to get to the solution. I have been using proc tabulate quite a bit lately.

Depends on the data whether you need to reshape things.

I create lots of 0/1 coded variables but I do no report on the 0 and 1 counts. All of the information is contained in the N statistic (number of values), the Sum (is the number of 1's ) and the Mean( is the percentage of 1's). If someone needs the number of 0's then is N - the Sum, if you need the percent of 0's then it is 1- mean.

Here's an example using a modified data set to have some missing values to see what things might look like:

data have;
 input A	B	C	D;
datalines;
1	0	1	0
1	0	1	1
1	0	1	0
0	0	1	0
0	1	0	1
.  .  1  0
1  .  1  0
;

proc tabulate data=have;
   var A B C D;
   table A B C D,
         n='# Values' sum='# 1' mean='%  1'*f=percent8.1
         /box='Variable'
   ;
run;
rajd1
Quartz | Level 8

I went back and ran this and noticed that the count numbers are off. The counts and percentages for variables B C and D are off. Below is the SAS output:

 

   
 N%N%
A210315
B42015
C15420
D315210
  
mkeintz
PROC Star

I corrected the counts.

But, I kept the percentages as percentage of row totals, not grand totals.  Is that what you want?

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------
rajd1
Quartz | Level 8

Yes, that works. I did need the row totals. Thanks so much. I am applying this for 100's of variables. Not sure if this the most efficient way but it works.

ballardw
Super User

@rajd1 wrote:

Yes, that works. I did need the row totals. Thanks so much. I am applying this for 100's of variables. Not sure if this the most efficient way but it works.


If you have 100's of such variables then I strongly suggest you look at my code that doesn't do separate columns for 0/1 but uses summary statistics. Why, you may ask? You can easily use a variable in the code. You haven't really provided any realistic example of your data but any of the list types in SAS will work both on the VAR statement or the the Table statement. Such as

Var a1-a100 ;

or

Var a: b: c: ; to use all of the variables starting with a, b or c.

 If any of your variables have missing values I suggest you test @mkeintz's solution very carefully.

Consider this data set:

data have;
  input A B C D;
datalines;
1 0 1 0
1 0 1 1
1 0 1 0
. 0 1 0
. 1 0 1
run;

used with that accepted solution. The Row percentages are going to look odd at best. As will @Ksharp's though in a slightly different manner. Of course if you have NO missing values for any of the variables you may be okay.

rajd1
Quartz | Level 8
That makes perfect sense! I will be using this code as well. Thanks a ton!
Ksharp
Super User

Make a macro to wrap the following code and go through all the variable name.

 

data have;
  input A B C D;
datalines;
1 0 1 0
1 0 1 1
1 0 1 0
0 0 1 0
0 1 0 1
;
proc sql;
select 'A' as variable,
       sum(A=0) as zero label='N(0)',100*calculated zero/count(*) as z_per label='0 %',
       sum(A=1) as one label='N(1)',100*calculated one/count(*) as z_per label='1 %'
   from have
union 
select 'B' as variable,
       sum(B=0) as zero label='N(0)',100*calculated zero/count(*) as z_per label='0 %',
       sum(B=1) as one label='N(1)',100*calculated one/count(*) as z_per label='1 %'
   from have
union 
select 'C' as variable,
       sum(C=0) as zero label='N(0)',100*calculated zero/count(*) as z_per label='0 %',
       sum(C=1) as one label='N(1)',100*calculated one/count(*) as z_per label='1 %'
   from have
union
select 'D' as variable,
       sum(D=0) as zero label='N(0)',100*calculated zero/count(*) as z_per label='0 %',
       sum(D=1) as one label='N(1)',100*calculated one/count(*) as z_per label='1 %'
   from have
;
quit;

Ready to join fellow brilliant minds for the SAS Hackathon?

Build your skills. Make connections. Enjoy creative freedom. Maybe change the world. Registration is now open through August 30th. Visit the SAS Hackathon homepage.

Register today!
How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 14 replies
  • 1264 views
  • 2 likes
  • 5 in conversation