I need some help, I am trying to create a report from transactional data that looks like this:
have | |||||||
transactional data: | |||||||
division | area | mgr_id | usr_id | tranamt | reason | reason_type | status |
north | MA | zz89744 | aa49876 | $34.00 | gw | system | Y |
north | MA | zz89744 | aa49876 | $12.00 | be | system | Y |
north | MA | zz89744 | aa49876 | $34.00 | fd | escalation | N |
north | MA | zz89744 | aa49876 | $25.00 | og | system | Y |
north | MA | zz89744 | aa49876 | $30.00 | gw | system | Y |
north | MA | zz89744 | aa49876 | $24.00 | gw | escalation | Y |
north | MA | zz89744 | ab49877 | $25.00 | og | system | Y |
north | MA | zz89744 | ab49877 | $35.00 | gw | escalation | Y |
north | MA | zz89744 | ab49877 | $24.00 | gw | system | N |
west | CA | zj89754 | ab39776 | $34.00 | og | escalation | Y |
west | CA | zj89754 | ab39776 | $12.00 | be | system | Y |
west | CA | zj89754 | ab39776 | $34.00 | fd | system | N |
west | CA | zj89754 | ab39776 | $25.00 | og | escalation | Y |
west | CA | zj89754 | ab39776 | $30.00 | be | system | Y |
west | CA | zj89754 | ac39776 | $24.00 | be | escalation | N |
east | MD | bz99743 | ca49879 | $25.00 | og | system | Y |
east | MD | bz99743 | ca49879 | $30.00 | gw | system | Y |
east | MD | bz99743 | ca49879 | $24.00 | gw | system | Y |
south | FL | cj89754 | ad49876 | $34.00 | og | escalation | Y |
south | FL | cj89754 | ad59876 | $12.00 | be | system | Y |
south | FL | cj89754 | ad59876 | $34.00 | fd | system | N |
south | FL | cj89754 | ad59876 | $25.00 | og | escalation | Y |
hj79754 | ab49876 | $34.00 | og | system | Y | ||
hj79754 | ab49876 | $30.00 | be | system | Y | ||
hj79754 | ab49876 | $34.00 | fd | system | Y | ||
hj79754 | ab49876 | $25.00 | gw | escalation | Y |
Reporting from above data set which I already have to look like this example below, I am using SAS 9.4 on unix server, developing in SAS EG but will ultimately be running in a scheduler with a functional ID. I use ODS for the output of the report.
any assistance will be greatly appreciated.
I have researched all day and am at a loss as to how to take the above data and make it look like this in an excel output.
Thank you
group be | group fd | group og | group gw | ||||||||||||||||||
escalation | system | escalation | system | escalation | system | escalation | system | ||||||||||||||
division | area | mgr_id | usr_id | # of trans | $ total trans | # of trans | $ total trans | # of trans | $ total trans | # of trans | $ total trans | # of trans | $ total trans | # of trans | $ total trans | # of trans | $ total trans | # of trans | $ total trans | # of trans | $ total trans |
north | MA | zz89744 | aa49876 | 6 | $159 | 1 | $12 | 1 | $34 | 1 | $25 | 1 | $24 | 2 | $64 | ||||||
ab49877 | 3 | $84 | 1 | $25 | 1 | $35 | 1 | $24 | |||||||||||||
west | CA | zj89754 | ab39776 | 5 | $135 | 2 | $42 | 1 | $34 | 2 | $59 | ||||||||||
ac39776 | 1 | $24 | 1 | $24 | |||||||||||||||||
east | MD | bz99743 | ca49879 | 3 | $79 | 1 | $25 | 2 | $54 | ||||||||||||
south | FL | cj89754 | ad49876 | 4 | $105 | 1 | $12 | 1 | $34 | 2 | $59 | ||||||||||
<missing> | <missing> | hj79754 | ab49876 | 4 | $123 | 1 | $30 | 1 | $34 | 1 | $34 | 1 | $25 | ||||||||
I am using |
|
It helps to provide example data as a data step as below.
data have; input division $ area $ mgr_id $ usr_id $ tranamt :comma10. reason $ reason_type:$10. status $ ; format tranamt dollar8.2; datalines; north MA zz89744 aa49876 $34.00 gw system Y north MA zz89744 aa49876 $12.00 be system Y north MA zz89744 aa49876 $34.00 fd escalation N north MA zz89744 aa49876 $25.00 og system Y north MA zz89744 aa49876 $30.00 gw system Y north MA zz89744 aa49876 $24.00 gw escalation Y north MA zz89744 ab49877 $25.00 og system Y north MA zz89744 ab49877 $35.00 gw escalation Y north MA zz89744 ab49877 $24.00 gw system N west CA zj89754 ab39776 $34.00 og escalation Y west CA zj89754 ab39776 $12.00 be system Y west CA zj89754 ab39776 $34.00 fd system N west CA zj89754 ab39776 $25.00 og escalation Y west CA zj89754 ab39776 $30.00 be system Y west CA zj89754 ac39776 $24.00 be escalation N east MD bz99743 ca49879 $25.00 og system Y east MD bz99743 ca49879 $30.00 gw system Y east MD bz99743 ca49879 $24.00 gw system Y south FL cj89754 ad49876 $34.00 og escalation Y south FL cj89754 ad59876 $12.00 be system Y south FL cj89754 ad59876 $34.00 fd system N south FL cj89754 ad59876 $25.00 og escalation Y . . hj79754 ab49876 $34.00 og system Y . . hj79754 ab49876 $30.00 be system Y . . hj79754 ab49876 $34.00 fd system Y . . hj79754 ab49876 $25.00 gw escalation Y ; Proc tabulate data=have; class division area /missing; class mgr_id usr_id reason reason_type; var tranamt; table division*area*mgr_id*usr_id, (all='Total' reason=' '*reason_type=' ')*(tranamt=' '*(n="# of trans" sum="total of Trans"*f=dollar8.)) /misstext=' ' ; run;
Proc tabulate is my preferred reporting tool when variables have to "nest" in multiple directions.
The key parts in tabulate above:
Class statements to tell how which variables will be used for "groups". Since you intended to explicitly include missing values for two variables (thank you for indicating that) you use the Class statement option to indicate that with the /MISSING option. Otherwise tabulate will drop any record where a class variable is missing.
Then the VAR statement indicates which variable(s) will be used in numeric calculations such as sum, mean, stddev etc. Anything can be counted (the N statistic) but to do the sum the variable must be on a VAR statement.
The * between variables says to nest the values. The comma is used to separate dimensions. A two dimension table has one comma between the row and the columns. You can create a "3 dimension" with the third dimension as page as the first table dimension and then 2 commas. (Exercise for the interested reader: replace the first * after district with a comma to see this behavior). Groups of related variables or statistics are placed in ( ).
So (all='Total' reason=' '*reason_type=' ') uses a special "group" keyword ALL that says we want to summarize all records and then the nested pairs of Reason and Reason_type will be have the same. After the * on that line the Tranamt variable is used with two grouped statistics n and sum.
The text in quotes after a variable or statistics provides a label in the table. Blank text as in Reason=' ' suppresses any label and then we provide overrides for N and Sum to show specific text.
The / after the dimensions provide some table options. In this case misstext = ' ' means that nothing is shown when a value is missing. You can provide more text than the default system missing option that allows for one character but if you have missing values in row or column headers the results may look odd.
You can use the "ALL" in different locations of the code to create different summary groups.
Proc Tabulate also allows multiple table statements so you can create lots of output from one procedure call.
With the hints above do you think you could get a ROW with a summary of all the records?
There are enough options with Proc Tabulate that there is an actual book published with examples.
It helps to provide example data as a data step as below.
data have; input division $ area $ mgr_id $ usr_id $ tranamt :comma10. reason $ reason_type:$10. status $ ; format tranamt dollar8.2; datalines; north MA zz89744 aa49876 $34.00 gw system Y north MA zz89744 aa49876 $12.00 be system Y north MA zz89744 aa49876 $34.00 fd escalation N north MA zz89744 aa49876 $25.00 og system Y north MA zz89744 aa49876 $30.00 gw system Y north MA zz89744 aa49876 $24.00 gw escalation Y north MA zz89744 ab49877 $25.00 og system Y north MA zz89744 ab49877 $35.00 gw escalation Y north MA zz89744 ab49877 $24.00 gw system N west CA zj89754 ab39776 $34.00 og escalation Y west CA zj89754 ab39776 $12.00 be system Y west CA zj89754 ab39776 $34.00 fd system N west CA zj89754 ab39776 $25.00 og escalation Y west CA zj89754 ab39776 $30.00 be system Y west CA zj89754 ac39776 $24.00 be escalation N east MD bz99743 ca49879 $25.00 og system Y east MD bz99743 ca49879 $30.00 gw system Y east MD bz99743 ca49879 $24.00 gw system Y south FL cj89754 ad49876 $34.00 og escalation Y south FL cj89754 ad59876 $12.00 be system Y south FL cj89754 ad59876 $34.00 fd system N south FL cj89754 ad59876 $25.00 og escalation Y . . hj79754 ab49876 $34.00 og system Y . . hj79754 ab49876 $30.00 be system Y . . hj79754 ab49876 $34.00 fd system Y . . hj79754 ab49876 $25.00 gw escalation Y ; Proc tabulate data=have; class division area /missing; class mgr_id usr_id reason reason_type; var tranamt; table division*area*mgr_id*usr_id, (all='Total' reason=' '*reason_type=' ')*(tranamt=' '*(n="# of trans" sum="total of Trans"*f=dollar8.)) /misstext=' ' ; run;
Proc tabulate is my preferred reporting tool when variables have to "nest" in multiple directions.
The key parts in tabulate above:
Class statements to tell how which variables will be used for "groups". Since you intended to explicitly include missing values for two variables (thank you for indicating that) you use the Class statement option to indicate that with the /MISSING option. Otherwise tabulate will drop any record where a class variable is missing.
Then the VAR statement indicates which variable(s) will be used in numeric calculations such as sum, mean, stddev etc. Anything can be counted (the N statistic) but to do the sum the variable must be on a VAR statement.
The * between variables says to nest the values. The comma is used to separate dimensions. A two dimension table has one comma between the row and the columns. You can create a "3 dimension" with the third dimension as page as the first table dimension and then 2 commas. (Exercise for the interested reader: replace the first * after district with a comma to see this behavior). Groups of related variables or statistics are placed in ( ).
So (all='Total' reason=' '*reason_type=' ') uses a special "group" keyword ALL that says we want to summarize all records and then the nested pairs of Reason and Reason_type will be have the same. After the * on that line the Tranamt variable is used with two grouped statistics n and sum.
The text in quotes after a variable or statistics provides a label in the table. Blank text as in Reason=' ' suppresses any label and then we provide overrides for N and Sum to show specific text.
The / after the dimensions provide some table options. In this case misstext = ' ' means that nothing is shown when a value is missing. You can provide more text than the default system missing option that allows for one character but if you have missing values in row or column headers the results may look odd.
You can use the "ALL" in different locations of the code to create different summary groups.
Proc Tabulate also allows multiple table statements so you can create lots of output from one procedure call.
With the hints above do you think you could get a ROW with a summary of all the records?
There are enough options with Proc Tabulate that there is an actual book published with examples.
ballardw Thank you very much for taking the time to help me with this and give me the explanation of how Proc Tabulate works.
Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.