BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
doesper
Obsidian | Level 7

When producing demographics reports, we usually don't have a 100% match rate, so there are missing values.  Let's choose a simple example: gender.  In the code below, we are creating a SAS dataset with two variables, group and gender.  We then use PROC TABULATE to show the percentage of females and males in groups A and B.  Though I want to show the number of individuals for which I do not have a gender (the missings), I do not want the missings used in the calculation of percent females and percent males.  Furthermore--and this is the more complicated part--I want the percentage with Unknown gender (again, the missings) to be shown as the percentage of non-missing, and the column total percent to still add up to 100% (the sum of the non-missing percentages).

 

The program below generates the following output:

 

PROC TABULATE Output

 

However, I would like to modify my PROC TABULATE step to generate the following table.

 

Desired PROC TABULATE Output

 

How can this be accomplished?

 

Thanks much,

 

David Oesper

 

proc format;
   value $ge
      'F' = "Female"
      'M' = "Male"
   other  = "Unknown";
run;

proc format;
   picture pctfmt low-high='009.9%';
run;

data demos;
   input @1 group  $1.
         @3 gender $1.
         ;
datalines;
A F
A F
A F
A F
A M
A M
A
B F
B F
B F
B M
B
B
;
run;

proc tabulate data=demos format=comma10. missing;        
   class gender group / order=data;                                  
   classlev gender group;
   format gender $ge.;

   table gender=' ' all, group=''*(n colpctn*f=pctfmt.) / box='Gender' misstext=' ';    
   keylabel all='TOTAL' n=' ' colpctn=' ';

   title 'Gender'; 
run;
1 ACCEPTED SOLUTION

Accepted Solutions
doesper
Obsidian | Level 7

Thanks, PG!  I think it was clever of you to use count(gender) in the PROC SQL step and omitting gender from the group by in order to apply the fraction each observation with known gender contributes towards the total of all known genders in each group (i.e. NOTE: The query requires remerging summary statistics back with the original data).  I did not know (or had forgotten) that count(gender) counts only non-missing observations and count(*) counts both missing and non-missing.  Kudos, I would not have thought of this solution and I have been a SAS programmer for a very long time.

I used your contribution and made some additional modifications to the code to get exactly what I am looking for (posted below).  I hope others will find this code to be useful when they wish to report percentages in terms of total non-missing and yet include in the report missing as percent of non-missing to see coverage.

 

Thanks again!

 

David Oesper

 

proc format;
   value $ge
      'F'    = "Female"
      'M'    = "Male"
      other  = "Unknown";
   value pctfmt
      low -< 1.0 = [percentn7.1]
      1.0 - high = "100.0%";
run;

data demos;
   input @1 grp    $1.
         @3 gender $1.
         ;
datalines;
A F
A F
A F
A F
A M
A M
A
B F
B F
B F
B M
B
B
;
run;

proc sql noprint;
   create table demos2 as
   select grp
         ,gender
         ,1/count(gender) as obspct
   from demos
   group by grp
   order by missing(grp), grp
;quit;

proc tabulate data=demos2 format=comma10. missing;
   class gender grp / order=data;
   classlev gender grp;
   format gender $ge.;

   table gender=' ' all, grp=''*obspct=''*(n='' sum=''*format=pctfmt.) / box='Gender' misstext=' ';
   keylabel all='TOTAL';

   title 'Gender';
   var obspct;
run;

View solution in original post

4 REPLIES 4
PGStats
Opal | Level 21

All I could come up with using proc tabulate was this:

 

proc sql;
create table summ as
select group, gender, 1 / count(gender) as w
from demos
group by group;
quit;

proc format;
value cheat
low -< 1.0 = [percentn7.1]
1.0 - high = "100.0%";
run;
 
proc tabulate data=summ missing;
class group gender;
var w;
table gender all, group=""*w=""*(n=""*format=7.0 sum=""*format=cheat.);
run; 

Good enough?

PG
doesper
Obsidian | Level 7

Thanks, PG!  I think it was clever of you to use count(gender) in the PROC SQL step and omitting gender from the group by in order to apply the fraction each observation with known gender contributes towards the total of all known genders in each group (i.e. NOTE: The query requires remerging summary statistics back with the original data).  I did not know (or had forgotten) that count(gender) counts only non-missing observations and count(*) counts both missing and non-missing.  Kudos, I would not have thought of this solution and I have been a SAS programmer for a very long time.

I used your contribution and made some additional modifications to the code to get exactly what I am looking for (posted below).  I hope others will find this code to be useful when they wish to report percentages in terms of total non-missing and yet include in the report missing as percent of non-missing to see coverage.

 

Thanks again!

 

David Oesper

 

proc format;
   value $ge
      'F'    = "Female"
      'M'    = "Male"
      other  = "Unknown";
   value pctfmt
      low -< 1.0 = [percentn7.1]
      1.0 - high = "100.0%";
run;

data demos;
   input @1 grp    $1.
         @3 gender $1.
         ;
datalines;
A F
A F
A F
A F
A M
A M
A
B F
B F
B F
B M
B
B
;
run;

proc sql noprint;
   create table demos2 as
   select grp
         ,gender
         ,1/count(gender) as obspct
   from demos
   group by grp
   order by missing(grp), grp
;quit;

proc tabulate data=demos2 format=comma10. missing;
   class gender grp / order=data;
   classlev gender grp;
   format gender $ge.;

   table gender=' ' all, grp=''*obspct=''*(n='' sum=''*format=pctfmt.) / box='Gender' misstext=' ';
   keylabel all='TOTAL';

   title 'Gender';
   var obspct;
run;
nrozario
Calcite | Level 5

Hi ,

 

I have the same problem but if I multiple variables in the same proc tabulate and they have missing observations for different variables. Ideally I would love to have proc tabulate to tweak the "missing" option. I added a variable pay to the dataset and would like to have the output below. Any suggestions? is there a instead of using proc sql ? Or i would need to learn proc report?

 

proc format;
value $ge
'F' = "Female"
'M' = "Male"
other = "Unknown";
value pctfmt
low -< 1.0 = [percentn7.1]
1.0 - high = "100.0%";
run;

data demos;
input @1 grp $1.
@3 gender $1.
@5 pay $1.
;
datalines;
A F M
A F D
A F M
A F D
A M
A M D
A     D
B F M
B F D
B F D
B M
B     D
B     M
;
run;

 

Desired Output(in the same proc tabulate):

 

Gender

A

 

B

Female

4

66.70%

3

75.00%

Male

2

33.30%

1

25.00%

Unknown

1

16.70%

2

50.00%

TOTAL

7

100.00%

6

100.00%

Pay

 

 

 

 

D

4

66.67%

3

60.00%

M

2

33.33%

2

40.00%

Unknown

1

14.29%

1

16.67%

TOTAL

7

100.00%

6

100.00%

      

 

GenderAB
Female466.70%375.00%
Male233.30%125.00%
Unknown116.70%250.00%
TOTAL7100.00%6100.00%
Pay    
D466.67%360.00%
M233.33%240.00%
Unknown114.29%116.67%
TOTAL7100.00%6100.00%
PGStats
Opal | Level 21

Suggestion: submit as a new topic.

PG

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

What is Bayesian Analysis?

Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 4 replies
  • 3724 views
  • 0 likes
  • 3 in conversation