Hello
I want to calculate summary statistics using proc tabulate.
I have 3 requests:
1-Using overlapping formats values (overlapping ranges)
2-Show also missing groups (in this example: for group '9-10' need to show it too with zero values)
3-columns 'PCT_Customers' and ' PCT_Sum_Y' will be displayed with % symbol
columns 'Sum_Y' and 'No_Customers' will be displayed with format comma18.
From 3 requests I know how to do the 1st one.
Please find the code.
Data ttbl;
input ID group Y;
cards;
1 3 10
2 4 15
3 3 20
4 4 25
5 5 30
6 8 35
7 11 40
8 7 45
9 11 50
10 3 55
;
run;
proc format;
value multiFmt (multilabel notsorted)
2,3='2-3'
4='4'
2-4='Total 2-4'
5-8='5-8'
9-10='9-10'
5-10='Total 5-10'
11='11'
low-high='Total All'
;
Run;
PROC TABULATE DATA=ttbl OUT=output;
class group / mlf preloadfmt order=data ;
VAR ID Y ;
TABLE group='' ,
ID=''*N='No_Customers'
ID=''*PCTN='PCT_Customers'
Y=''*sum='Sum_Y'
Y=''*PCTsum='PCT_Sum_Y'
;
format group multiFmt.;
RUN;
2-Show also missing groups (in this example: for group '9-10' need to show it too with zero values)
See option classdata=
3-columns 'PCT_Customers' and ' PCT_Sum_Y' will be displayed with % symbol
Something like
Y=' '*PCTsum='PCT_Sum_Y'*f=percent9.2
You likely want the PRINTMISS table option to get the Preloadfmt to work as desired:
PROC TABULATE DATA=work.ttbl OUT=output; class group / mlf preloadfmt order=data ; VAR ID Y ; TABLE group='' , ID=''*N='No_Customers' ID=''*PCTN='PCT_Customers' Y=''*sum='Sum_Y' Y=''*PCTsum='PCT_Sum_Y' /printmiss ; format group multiFmt.; RUN;
If you want 0 to appear for the missing statistics associated with the that 9-10 row such as 0% you will need to create a custom format that displays missing that way. Since you are using the PCTSUM statistic you will also need to create custom format because a display with the PERCENTw.d format really expects a source value in decimal form such as .153 to display as 15.3% but PCTSum and the PCT statistics will have values already multiplied by 100 to look nice and would already be 15.3. Which with a Percent format applied would then be multiplied by 100 again.
proc format library=work; value multiFmt (multilabel notsorted) 2,3='2-3' 4='4' 2-4='Total 2-4' 5-8='5-8' 9-10='9-10' 5-10='Total 5-10' 11='11' low-high='Total All' ; value custcomma .='0' low-high = [Comma9.] ; picture custperc . = '0%' (noedit) low-high = '009.99%'; Run; PROC TABULATE DATA=work.ttbl OUT=output; class group / mlf preloadfmt order=data ; VAR ID Y ; TABLE group='' , ID=''*N='No_Customers' ID=''*PCTN='PCT_Customers' Y=''*sum='Sum_Y' *f=custcomma. Y=''*PCTsum='PCT_Sum_Y'*f=custperc. /printmiss ; format group multiFmt.; RUN;
There is a table option MISSTEXT=' ' to show a specific given text for all missing values. With a table only having counts then MISSTEXT='0' could be appropriate. With other statistics though that might not work, especially if you want a mix of 0 and 0%.
Build your skills. Make connections. Enjoy creative freedom. Maybe change the world. Registration is now open through August 30th. Visit the SAS Hackathon homepage.
Register today!Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Select SAS Training centers are offering in-person courses. View upcoming courses for: