When producing demographics reports, we usually don't have a 100% match rate, so there are missing values. Let's choose a simple example: gender. In the code below, we are creating a SAS dataset with two variables, group and gender. We then use PROC TABULATE to show the percentage of females and males in groups A and B. Though I want to show the number of individuals for which I do not have a gender (the missings), I do not want the missings used in the calculation of percent females and percent males. Furthermore--and this is the more complicated part--I want the percentage with Unknown gender (again, the missings) to be shown as the percentage of non-missing, and the column total percent to still add up to 100% (the sum of the non-missing percentages).
The program below generates the following output:
However, I would like to modify my PROC TABULATE step to generate the following table.
How can this be accomplished?
Thanks much,
David Oesper
proc format;
value $ge
'F' = "Female"
'M' = "Male"
other = "Unknown";
run;
proc format;
picture pctfmt low-high='009.9%';
run;
data demos;
input @1 group $1.
@3 gender $1.
;
datalines;
A F
A F
A F
A F
A M
A M
A
B F
B F
B F
B M
B
B
;
run;
proc tabulate data=demos format=comma10. missing;
class gender group / order=data;
classlev gender group;
format gender $ge.;
table gender=' ' all, group=''*(n colpctn*f=pctfmt.) / box='Gender' misstext=' ';
keylabel all='TOTAL' n=' ' colpctn=' ';
title 'Gender';
run;
Thanks, PG! I think it was clever of you to use count(gender) in the PROC SQL step and omitting gender from the group by in order to apply the fraction each observation with known gender contributes towards the total of all known genders in each group (i.e. NOTE: The query requires remerging summary statistics back with the original data). I did not know (or had forgotten) that count(gender) counts only non-missing observations and count(*) counts both missing and non-missing. Kudos, I would not have thought of this solution and I have been a SAS programmer for a very long time.
I used your contribution and made some additional modifications to the code to get exactly what I am looking for (posted below). I hope others will find this code to be useful when they wish to report percentages in terms of total non-missing and yet include in the report missing as percent of non-missing to see coverage.
Thanks again!
David Oesper
proc format;
value $ge
'F' = "Female"
'M' = "Male"
other = "Unknown";
value pctfmt
low -< 1.0 = [percentn7.1]
1.0 - high = "100.0%";
run;
data demos;
input @1 grp $1.
@3 gender $1.
;
datalines;
A F
A F
A F
A F
A M
A M
A
B F
B F
B F
B M
B
B
;
run;
proc sql noprint;
create table demos2 as
select grp
,gender
,1/count(gender) as obspct
from demos
group by grp
order by missing(grp), grp
;quit;
proc tabulate data=demos2 format=comma10. missing;
class gender grp / order=data;
classlev gender grp;
format gender $ge.;
table gender=' ' all, grp=''*obspct=''*(n='' sum=''*format=pctfmt.) / box='Gender' misstext=' ';
keylabel all='TOTAL';
title 'Gender';
var obspct;
run;
All I could come up with using proc tabulate was this:
proc sql;
create table summ as
select group, gender, 1 / count(gender) as w
from demos
group by group;
quit;
proc format;
value cheat
low -< 1.0 = [percentn7.1]
1.0 - high = "100.0%";
run;
proc tabulate data=summ missing;
class group gender;
var w;
table gender all, group=""*w=""*(n=""*format=7.0 sum=""*format=cheat.);
run;
Good enough?
Thanks, PG! I think it was clever of you to use count(gender) in the PROC SQL step and omitting gender from the group by in order to apply the fraction each observation with known gender contributes towards the total of all known genders in each group (i.e. NOTE: The query requires remerging summary statistics back with the original data). I did not know (or had forgotten) that count(gender) counts only non-missing observations and count(*) counts both missing and non-missing. Kudos, I would not have thought of this solution and I have been a SAS programmer for a very long time.
I used your contribution and made some additional modifications to the code to get exactly what I am looking for (posted below). I hope others will find this code to be useful when they wish to report percentages in terms of total non-missing and yet include in the report missing as percent of non-missing to see coverage.
Thanks again!
David Oesper
proc format;
value $ge
'F' = "Female"
'M' = "Male"
other = "Unknown";
value pctfmt
low -< 1.0 = [percentn7.1]
1.0 - high = "100.0%";
run;
data demos;
input @1 grp $1.
@3 gender $1.
;
datalines;
A F
A F
A F
A F
A M
A M
A
B F
B F
B F
B M
B
B
;
run;
proc sql noprint;
create table demos2 as
select grp
,gender
,1/count(gender) as obspct
from demos
group by grp
order by missing(grp), grp
;quit;
proc tabulate data=demos2 format=comma10. missing;
class gender grp / order=data;
classlev gender grp;
format gender $ge.;
table gender=' ' all, grp=''*obspct=''*(n='' sum=''*format=pctfmt.) / box='Gender' misstext=' ';
keylabel all='TOTAL';
title 'Gender';
var obspct;
run;
Hi ,
I have the same problem but if I multiple variables in the same proc tabulate and they have missing observations for different variables. Ideally I would love to have proc tabulate to tweak the "missing" option. I added a variable pay to the dataset and would like to have the output below. Any suggestions? is there a instead of using proc sql ? Or i would need to learn proc report?
proc format;
value $ge
'F' = "Female"
'M' = "Male"
other = "Unknown";
value pctfmt
low -< 1.0 = [percentn7.1]
1.0 - high = "100.0%";
run;
data demos;
input @1 grp $1.
@3 gender $1.
@5 pay $1.
;
datalines;
A F M
A F D
A F M
A F D
A M
A M D
A D
B F M
B F D
B F D
B M
B D
B M
;
run;
Desired Output(in the same proc tabulate):
Gender | A | B | |||
Female | 4 | 66.70% | 3 | 75.00% | |
Male | 2 | 33.30% | 1 | 25.00% | |
Unknown | 1 | 16.70% | 2 | 50.00% | |
TOTAL | 7 | 100.00% | 6 | 100.00% | |
Pay |
|
|
|
| |
D | 4 | 66.67% | 3 | 60.00% | |
M | 2 | 33.33% | 2 | 40.00% | |
Unknown | 1 | 14.29% | 1 | 16.67% | |
TOTAL | 7 | 100.00% | 6 | 100.00% | |
Gender | A | B | ||
Female | 4 | 66.70% | 3 | 75.00% |
Male | 2 | 33.30% | 1 | 25.00% |
Unknown | 1 | 16.70% | 2 | 50.00% |
TOTAL | 7 | 100.00% | 6 | 100.00% |
Pay | ||||
D | 4 | 66.67% | 3 | 60.00% |
M | 2 | 33.33% | 2 | 40.00% |
Unknown | 1 | 14.29% | 1 | 16.67% |
TOTAL | 7 | 100.00% | 6 | 100.00% |
Suggestion: submit as a new topic.
Join us for SAS Innovate 2025, our biggest and most exciting global event of the year, in Orlando, FL, from May 6-9. Sign up by March 14 for just $795.
Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.