Hi all,
My problem is relatively simple but i am confused as to what procedure to use. I tried proc freq and proc tabulate but could not accomplish it
Here is my data set:
A | B | C | D |
1 | 0 | 1 | 0 |
1 | 0 | 1 | 1 |
1 | 0 | 1 | 0 |
0 | 0 | 1 | 0 |
0 | 1 | 0 | 1 |
This is the table i need:
Variable | N(0) | 0% | N(1) | 1% |
A | ||||
B | ||||
C | ||||
D |
Here is my code:
proc tabulate data= gg NOSEPS format=comma11.2;
class A B C D / missing;
table (A B C D), (all pctn)/rts=25;
keylabel
pctn = '%';
run;
thanks!!
I presume in your percent columns you want row percentages - i.e. percentages in each row sum to 100%.
I suspect this can be done in proc report, but to do it in proc tabulate requires creating an intermediate data set, You want a column for zeroes, and for ones. across A B C and D. So basically you need two new variables ZERO and ONE which you can cross against their occurences in A, B, C, and D.
data have;
input A B C D;
datalines;
1 0 1 0
1 0 1 1
1 0 1 0
0 0 1 0
0 1 0 1
run;
data Vneed (keep=vname zero one total) / VIEW=VNEED;
set have;
array var a b c d;
do over var;
vname=vname(var);
call missing(zero,one); /*Correction added */
if var=0 then zero=1; else
if var=1 then one=1;
total=1;
output;
end;
run;
proc tabulate data=Vneed missing ;
class vname ;
var zero one total;
tables vname=' ', (zero='0' one='1')*(sum='N'*f=3. pctsum<total>='%'*f=5.1) /rts=10;
run;
This create 4 records for each obs in HAVE. It also creates a new variable TOTAL. I did that to use it in the PCTSUM operator in proc tabulate. For each variables A through D, TOTAL sums to the number of rows in the original data set, which is what I need in the denominator of each requested percentage.
Other notes, all concerning appearance and formatting of the table.
Edit note: I added the CALL MISSING to make sure an observation was not double counted (i.e. both ZERO and ONE =1).
Please explain the four different columns in the output table.
To get the columns you show you need another variable and a different data structure for Tabulate to work with.
data have; input A B C D; datalines; 1 0 1 0 1 0 1 1 1 0 1 0 0 0 1 0 0 1 0 1 ; data need; set have; array v A B C D; do i= 1 to dim(v); name = vname(v[i]); value=v[i]; output; end; keep name value; run; proc tabulate data=need; class name value; table name=' ', value=' '*(n pctn='%') /box='Name' ; run;
Is as close as I can get.
Other approaches depending on how the column labels must appear would involve summarizing the data, probably the Need data set and then further reshaping.
Here is the SAS output: The counts are good but the percentages are off. For example the category 0 for variable A should be 40% and category 1 should be 50%.
Name | 0 | 1 | ||
N | % | N | % | |
A | 2 | 10 | 3 | 15 |
B | 4 | 20 | 1 | 5 |
C | 1 | 5 | 4 | 20 |
D | 3 | 15 | 2 | 10 |
I presume in your percent columns you want row percentages - i.e. percentages in each row sum to 100%.
I suspect this can be done in proc report, but to do it in proc tabulate requires creating an intermediate data set, You want a column for zeroes, and for ones. across A B C and D. So basically you need two new variables ZERO and ONE which you can cross against their occurences in A, B, C, and D.
data have;
input A B C D;
datalines;
1 0 1 0
1 0 1 1
1 0 1 0
0 0 1 0
0 1 0 1
run;
data Vneed (keep=vname zero one total) / VIEW=VNEED;
set have;
array var a b c d;
do over var;
vname=vname(var);
call missing(zero,one); /*Correction added */
if var=0 then zero=1; else
if var=1 then one=1;
total=1;
output;
end;
run;
proc tabulate data=Vneed missing ;
class vname ;
var zero one total;
tables vname=' ', (zero='0' one='1')*(sum='N'*f=3. pctsum<total>='%'*f=5.1) /rts=10;
run;
This create 4 records for each obs in HAVE. It also creates a new variable TOTAL. I did that to use it in the PCTSUM operator in proc tabulate. For each variables A through D, TOTAL sums to the number of rows in the original data set, which is what I need in the denominator of each requested percentage.
Other notes, all concerning appearance and formatting of the table.
Edit note: I added the CALL MISSING to make sure an observation was not double counted (i.e. both ZERO and ONE =1).
@rajd1 wrote:
This is a great solution. THANK YOU! Just seems like PROC TABULATE might use one extra step to get to the solution. I have been using proc tabulate quite a bit lately.
Depends on the data whether you need to reshape things.
I create lots of 0/1 coded variables but I do no report on the 0 and 1 counts. All of the information is contained in the N statistic (number of values), the Sum (is the number of 1's ) and the Mean( is the percentage of 1's). If someone needs the number of 0's then is N - the Sum, if you need the percent of 0's then it is 1- mean.
Here's an example using a modified data set to have some missing values to see what things might look like:
data have; input A B C D; datalines; 1 0 1 0 1 0 1 1 1 0 1 0 0 0 1 0 0 1 0 1 . . 1 0 1 . 1 0 ; proc tabulate data=have; var A B C D; table A B C D, n='# Values' sum='# 1' mean='% 1'*f=percent8.1 /box='Variable' ; run;
I went back and ran this and noticed that the count numbers are off. The counts and percentages for variables B C and D are off. Below is the SAS output:
N | % | N | % | |
A | 2 | 10 | 3 | 15 |
B | 4 | 20 | 1 | 5 |
C | 1 | 5 | 4 | 20 |
D | 3 | 15 | 2 | 10 |
I corrected the counts.
But, I kept the percentages as percentage of row totals, not grand totals. Is that what you want?
Yes, that works. I did need the row totals. Thanks so much. I am applying this for 100's of variables. Not sure if this the most efficient way but it works.
@rajd1 wrote:
Yes, that works. I did need the row totals. Thanks so much. I am applying this for 100's of variables. Not sure if this the most efficient way but it works.
If you have 100's of such variables then I strongly suggest you look at my code that doesn't do separate columns for 0/1 but uses summary statistics. Why, you may ask? You can easily use a variable in the code. You haven't really provided any realistic example of your data but any of the list types in SAS will work both on the VAR statement or the the Table statement. Such as
Var a1-a100 ;
or
Var a: b: c: ; to use all of the variables starting with a, b or c.
If any of your variables have missing values I suggest you test @mkeintz's solution very carefully.
Consider this data set:
data have; input A B C D; datalines; 1 0 1 0 1 0 1 1 1 0 1 0 . 0 1 0 . 1 0 1 run;
used with that accepted solution. The Row percentages are going to look odd at best. As will @Ksharp's though in a slightly different manner. Of course if you have NO missing values for any of the variables you may be okay.
Make a macro to wrap the following code and go through all the variable name.
data have; input A B C D; datalines; 1 0 1 0 1 0 1 1 1 0 1 0 0 0 1 0 0 1 0 1 ; proc sql; select 'A' as variable, sum(A=0) as zero label='N(0)',100*calculated zero/count(*) as z_per label='0 %', sum(A=1) as one label='N(1)',100*calculated one/count(*) as z_per label='1 %' from have union select 'B' as variable, sum(B=0) as zero label='N(0)',100*calculated zero/count(*) as z_per label='0 %', sum(B=1) as one label='N(1)',100*calculated one/count(*) as z_per label='1 %' from have union select 'C' as variable, sum(C=0) as zero label='N(0)',100*calculated zero/count(*) as z_per label='0 %', sum(C=1) as one label='N(1)',100*calculated one/count(*) as z_per label='1 %' from have union select 'D' as variable, sum(D=0) as zero label='N(0)',100*calculated zero/count(*) as z_per label='0 %', sum(D=1) as one label='N(1)',100*calculated one/count(*) as z_per label='1 %' from have ; quit;
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.