Hello
I want to calculate summary statistics using proc tabulate.
I have 3 requests:
1-Using overlapping formats values (overlapping ranges)
2-Show also missing groups (in this example: for group '9-10' need to show it too with zero values)
3-columns 'PCT_Customers' and ' PCT_Sum_Y' will be displayed with % symbol
columns 'Sum_Y' and 'No_Customers' will be displayed with format comma18.
From 3 requests I know how to do the 1st one.
Please find the code.
Data ttbl;
input ID group Y;
cards;
1 3 10
2 4 15
3 3 20
4 4 25
5 5 30
6 8 35
7 11 40
8 7 45
9 11 50
10 3 55
;
run;
proc format;
value multiFmt (multilabel notsorted)
2,3='2-3'
4='4'
2-4='Total 2-4'
5-8='5-8'
9-10='9-10'
5-10='Total 5-10'
11='11'
low-high='Total All'
;
Run;
PROC TABULATE DATA=ttbl OUT=output;
class group / mlf preloadfmt order=data ;
VAR ID Y ;
TABLE group='' ,
ID=''*N='No_Customers'
ID=''*PCTN='PCT_Customers'
Y=''*sum='Sum_Y'
Y=''*PCTsum='PCT_Sum_Y'
;
format group multiFmt.;
RUN;
2-Show also missing groups (in this example: for group '9-10' need to show it too with zero values)
See option classdata=
3-columns 'PCT_Customers' and ' PCT_Sum_Y' will be displayed with % symbol
Something like
Y=' '*PCTsum='PCT_Sum_Y'*f=percent9.2
You likely want the PRINTMISS table option to get the Preloadfmt to work as desired:
PROC TABULATE DATA=work.ttbl OUT=output;
class group / mlf preloadfmt order=data ;
VAR ID Y ;
TABLE group='' ,
ID=''*N='No_Customers'
ID=''*PCTN='PCT_Customers'
Y=''*sum='Sum_Y'
Y=''*PCTsum='PCT_Sum_Y'
/printmiss
;
format group multiFmt.;
RUN;
If you want 0 to appear for the missing statistics associated with the that 9-10 row such as 0% you will need to create a custom format that displays missing that way. Since you are using the PCTSUM statistic you will also need to create custom format because a display with the PERCENTw.d format really expects a source value in decimal form such as .153 to display as 15.3% but PCTSum and the PCT statistics will have values already multiplied by 100 to look nice and would already be 15.3. Which with a Percent format applied would then be multiplied by 100 again.
proc format library=work;
value multiFmt (multilabel notsorted)
2,3='2-3'
4='4'
2-4='Total 2-4'
5-8='5-8'
9-10='9-10'
5-10='Total 5-10'
11='11'
low-high='Total All'
;
value custcomma
.='0'
low-high = [Comma9.]
;
picture custperc
. = '0%' (noedit)
low-high = '009.99%';
Run;
PROC TABULATE DATA=work.ttbl OUT=output;
class group / mlf preloadfmt order=data ;
VAR ID Y ;
TABLE group='' ,
ID=''*N='No_Customers'
ID=''*PCTN='PCT_Customers'
Y=''*sum='Sum_Y' *f=custcomma.
Y=''*PCTsum='PCT_Sum_Y'*f=custperc.
/printmiss
;
format group multiFmt.;
RUN;
There is a table option MISSTEXT=' ' to show a specific given text for all missing values. With a table only having counts then MISSTEXT='0' could be appropriate. With other statistics though that might not work, especially if you want a mix of 0 and 0%.
April 27 – 30 | Gaylord Texan | Grapevine, Texas
Walk in ready to learn. Walk out ready to deliver. This is the data and AI conference you can't afford to miss.
Register now and lock in 2025 pricing—just $495!
Still thinking about your presentation idea? The submission deadline has been extended to Friday, Nov. 14, at 11:59 p.m. ET.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.