The SAS Output Delivery System and reporting techniques

How to create a summary table by a subset of values from a categorical variable

Reply
Occasional Contributor
Posts: 12

How to create a summary table by a subset of values from a categorical variable

hi all I have a dataset here 

data have;
	input id $ sex employment druguse analgesic;
	cards;
1 F  "Full Time" 1 1
2 F "Part Time" 0 1
3 M "Unemployed" 0 1
4 M "Full Time" 0 0 
;
run;

I want to create a summary table that looks like that following

Female

Percentage

Full Time

50

Part Time

50

Unemployed

0

Male

 

Full Time

50

Part Time

0

Unemployed

50

 

within each sex category, there's percentage makeup of all employment status. 

 

Also, i want to create another table that only report on drug use = 1 that looks like following (I don't want to include drug = 0)

Drug use

Percentage

M

0

F

25

Analgesic

 

M

33.33

F

66.66

Thanks! 

 

 

SAS Super FREQ
Posts: 9,252

Re: How to create a summary table by a subset of values from a categorical variable

[ Edited ]
Posted in reply to panpan1215

Hi:
Your first report seems like PROC TABULATE might do the trick for you. What code have you tried?
cynthia

Occasional Contributor
Posts: 12

Re: How to create a summary table by a subset of values from a categorical variable

Posted in reply to Cynthia_sas

I'm sorry I don't know anything about proc tabuleau. i'm very new to SAS. Can you please demonstrate for me? Thanks

SAS Super FREQ
Posts: 9,252

Re: How to create a summary table by a subset of values from a categorical variable

Posted in reply to panpan1215

Hi:

  PROC TABULATE is a SAS procedure that can create summary tables based on your data. You can request many statistics other than the N (count) and the SUM. But those are the default statistics that TABULATE generates for you based on whether your variables are character or numeric.

 

  TABULATE expects all your variables for the TABLE to be declared as either CLASS for making groups or VAR for performing analysis. You can then use a TABLE statement to instruct PROC TABULATE what kind of table structure you want and what statistics and crossings you want. Your summary table can be a one dimension table of only columns or it can be a two dimension table of columns and rows or it can be a three dimension table of columns, rows and pages.

 

  TABULATE uses table operators to arrange the structure of the table. The TABLE operators are comma (new dimension), space (stack or concatenate tables) and < or > symbols (specify a denominator for a custom percent). In addition, there is a special CLASS variable called ALL that allows you to generate subtotals and grand totals in any dimension.

 

  The statistics that you generate can be as simple as N or SUM, but there are many other statistics like MIN, MEAN, MAX, CSS, STD, MEDIAN, etc. Look in the TABULATE documentation for the full list.

 

  The code below shows only 1 and 2 dimension tables because that's all you probably need for what you want to do.

 

  SASHELP.CLASS has 19 rows and 5 variables: NAME, AGE, SEX, HEIGHT and WEIGHT. The following program has examples of different tables all produced with PROC TABULATE to provide you with an overview.

 

  You said you are new to TABULATE, but other than creating the data with your program, what other code have you tried? PROC FREQ? PROC UNIVARIATE? Data Step? If you have a program that creates data, did you try any other code. Did your program even work to make data? Because the way I interpret your posted example, your INPUT statement is incorrect because you don't have a $ for SEX or EMPLOYMENT variables and you don't specify a length for EMPLOYMENT. So I used SASHELP.CLASS instead of your data -- it gave me more variables and more rows for example purposes.

proc print data=sashelp.class;
title 'What data is in sashelp.class';
run;
 
proc tabulate data=sashelp.class;
title '1) One Dimension table';
  class age;
  table n*(age all);
  keylabel n='Count';
run;
 
proc tabulate data=sashelp.class;
title '2) Simple Frequency Example';
  class age ;
  table age all='Total',
        n='Count';
run;
 
proc tabulate data=sashelp.class;
title '3) Two Dimension table';
  class sex age ;
  table age all='Total',
        n='Count'*sex all='Total';
run;
 
proc tabulate data=sashelp.class;
title '4) Getting Statistics';
  class sex age ;
  var height weight;
  table age all='Overall',
        (n='Count')  mean*height*sex=' ' (mean css std)*weight*sex=' ';
run;
   
proc tabulate data=sashelp.class;
title '5) Tabulate Percent Example';
  class sex age;
  table sex*(age all) all,
        n pctn='% of Total' pctn<age all>='% of Group';
run;

  I hope this helps provide you with an understanding of PROC TABULATE.

 

cynthia

Occasional Contributor
Posts: 12

Re: How to create a summary table by a subset of values from a categorical variable

Posted in reply to Cynthia_sas

Thank you so much! But I still have one concern. The drug variable is binary (0,1) I only one the entries where drug = 1 and show the percentage of all. Do proc tabulate show all levels of a category variable?

 

Thanks. 

Occasional Contributor
Posts: 12

Re: How to create a summary table by a subset of values from a categorical variable

Posted in reply to Cynthia_sas

Also the way I have to create this exact table from the academic paper is that within drug variable, it's not it's level but age sections. How to accomplish that?

Super User
Posts: 12,994

Re: How to create a summary table by a subset of values from a categorical variable

Posted in reply to panpan1215

panpan1215 wrote:

Also the way I have to create this exact table from the academic paper is that within drug variable, it's not it's level but age sections. How to accomplish that?


I suggest that you go back to the example data and provide a few records more records with values for  employment druguse analgesic

I also suggest running the data step code you use and verify the contents of the resulting data set. Your code as provided generates:

1133
1134  data have;
1135     input id $ sex employment druguse analgesic;
1136     cards;

NOTE: Invalid data for sex in line 1137 3-3.
NOTE: Invalid data for employment in line 1137 6-10.
NOTE: Invalid data for druguse in line 1137 12-16.
RULE:       ----+----1----+----2----+----3----+----4----+----5----+----6----+----7----+----8----+-
1137        1 F  "Full Time" 1 1
id=1 sex=. employment=. druguse=. analgesic=1 _ERROR_=1 _N_=1
NOTE: Invalid data for sex in line 1138 3-3.
NOTE: Invalid data for employment in line 1138 5-9.
NOTE: Invalid data for druguse in line 1138 11-15.
1138        2 F "Part Time" 0 1
id=2 sex=. employment=. druguse=. analgesic=0 _ERROR_=1 _N_=2
NOTE: Invalid data for sex in line 1139 3-3.
NOTE: Invalid data for employment in line 1139 5-16.
1139        3 M "Unemployed" 0 1
id=3 sex=. employment=. druguse=0 analgesic=1 _ERROR_=1 _N_=3
NOTE: Invalid data for sex in line 1140 3-3.
NOTE: Invalid data for employment in line 1140 5-9.
NOTE: Invalid data for druguse in line 1140 11-15.
1140        4 M "Full Time" 0 0
id=4 sex=. employment=. druguse=. analgesic=0 _ERROR_=1 _N_=4
NOTE: The data set USER.HAVE has 4 observations and 5 variables.
NOTE: DATA statement used (Total process time):
      real time           0.04 seconds
      cpu time            0.01 seconds


1141  ;
1142  run;

because you haven't told SAS that either sex or employment is character, how long employment is and that you need to read the quoted value as a single variable.

 

 

There are some tricks involved with any of the report procedures involved with showing a value that doesn't appear in your data such as you have no Female with Unemployed.

And in your example for the second table what are you using to calculate percentages? You say "only report on drug use = 1". There is only one record with druguse=1. You need to be a bit more explicit about your numerator and denominator definitions.

Occasional Contributor
Posts: 18

Re: How to create a summary table by a subset of values from a categorical variable

This is just mock data I generated so I can make it clear for my question. But again like I said I don't know much about sas, that's why I'm asking. I just want to know in general how to display the exact same table. 

Super User
Posts: 12,994

Re: How to create a summary table by a subset of values from a categorical variable

Posted in reply to Viveme789

Viveme789 wrote:

This is just mock data I generated so I can make it clear for my question. But again like I said I don't know much about sas, that's why I'm asking. I just want to know in general how to display the exact same table. 


Without knowing what the denominators and numerators are for any value, especially your second table, it is very hard to provide an answer. You used a phrase: "only report on drug use = 1". The typical meaning of that is to SUBSET data to records where the value of the drug use variable is exactly one.

 

Data that doesn't allow actually creating tables to demonstrate is also a bit of an obstacle.

data have;
   infile datalines dlm=' ' dsd;
   informat id sex $1. employment $25.;
	input id $ sex employment druguse analgesic;
	cards;
1 F "Full Time" 1 1
2 F "Part Time" 0 1
3 M "Unemployed" 0 1
4 M "Full Time" 0 0 
;
run;

proc tabulate data=have out=work.junk;
   class sex employment;
   tables sex ,
          employment* rowpctn
   ;
run;

will calculate the percentages but when you get to having two different values in the same column (sex and employment) AND interleaved that will require a bit more work.

 

 

You can get a better data step Instructions here: https://communities.sas.com/t5/SAS-Communities-Library/How-to-create-a-data-step-version-of-your-dat... will show how to turn an existing SAS data set into data step code that can be pasted into a forum code box using the {i} icon or attached as text to show exactly what you have and that we can test code against.

Super User
Posts: 10,609

Re: How to create a summary table by a subset of values from a categorical variable

Posted in reply to panpan1215
data have;
   infile datalines dlm=' ' dsd;
   informat id sex $1. employment $25.;
	input id $ sex employment druguse analgesic;
	cards;
1 F "Full Time" 1 1
2 F "Part Time" 0 1
3 M "Unemployed" 0 1
4 M "Full Time" 0 0 
;
run;

proc sql ;
 create table junk as
  select sex,employment,count(*)/(select count(*) from have where sex=a.sex) as pct format=percent8.2
   from have as a
    group by sex,employment;
run;
proc format;
value $ fmt
 'F'='Female'
 'M'='Male';
run;
options missing='0';
proc report data=junk nowd completerows;
column sex employment pct;
define sex/group noprint;
define employment/group;
compute before sex;
 line @1 sex $fmt.;
endcomp;
run;
Ask a Question
Discussion stats
  • 9 replies
  • 418 views
  • 2 likes
  • 5 in conversation