I need some help, I am trying to create a report from transactional data that looks like this:
| have | |||||||
| transactional data: | |||||||
| division | area | mgr_id | usr_id | tranamt | reason | reason_type | status | 
| north | MA | zz89744 | aa49876 | $34.00 | gw | system | Y | 
| north | MA | zz89744 | aa49876 | $12.00 | be | system | Y | 
| north | MA | zz89744 | aa49876 | $34.00 | fd | escalation | N | 
| north | MA | zz89744 | aa49876 | $25.00 | og | system | Y | 
| north | MA | zz89744 | aa49876 | $30.00 | gw | system | Y | 
| north | MA | zz89744 | aa49876 | $24.00 | gw | escalation | Y | 
| north | MA | zz89744 | ab49877 | $25.00 | og | system | Y | 
| north | MA | zz89744 | ab49877 | $35.00 | gw | escalation | Y | 
| north | MA | zz89744 | ab49877 | $24.00 | gw | system | N | 
| west | CA | zj89754 | ab39776 | $34.00 | og | escalation | Y | 
| west | CA | zj89754 | ab39776 | $12.00 | be | system | Y | 
| west | CA | zj89754 | ab39776 | $34.00 | fd | system | N | 
| west | CA | zj89754 | ab39776 | $25.00 | og | escalation | Y | 
| west | CA | zj89754 | ab39776 | $30.00 | be | system | Y | 
| west | CA | zj89754 | ac39776 | $24.00 | be | escalation | N | 
| east | MD | bz99743 | ca49879 | $25.00 | og | system | Y | 
| east | MD | bz99743 | ca49879 | $30.00 | gw | system | Y | 
| east | MD | bz99743 | ca49879 | $24.00 | gw | system | Y | 
| south | FL | cj89754 | ad49876 | $34.00 | og | escalation | Y | 
| south | FL | cj89754 | ad59876 | $12.00 | be | system | Y | 
| south | FL | cj89754 | ad59876 | $34.00 | fd | system | N | 
| south | FL | cj89754 | ad59876 | $25.00 | og | escalation | Y | 
| hj79754 | ab49876 | $34.00 | og | system | Y | ||
| hj79754 | ab49876 | $30.00 | be | system | Y | ||
| hj79754 | ab49876 | $34.00 | fd | system | Y | ||
| hj79754 | ab49876 | $25.00 | gw | escalation | Y | 
Reporting from above data set which I already have to look like this example below, I am using SAS 9.4 on unix server, developing in SAS EG but will ultimately be running in a scheduler with a functional ID. I use ODS for the output of the report.
any assistance will be greatly appreciated.
I have researched all day and am at a loss as to how to take the above data and make it look like this in an excel output.
Thank you
| group be | group fd | group og | group gw | ||||||||||||||||||
| escalation | system | escalation | system | escalation | system | escalation | system | ||||||||||||||
| division | area | mgr_id | usr_id | # of trans | $ total trans | # of trans | $ total trans | # of trans | $ total trans | # of trans | $ total trans | # of trans | $ total trans | # of trans | $ total trans | # of trans | $ total trans | # of trans | $ total trans | # of trans | $ total trans | 
| north | MA | zz89744 | aa49876 | 6 | $159 | 1 | $12 | 1 | $34 | 1 | $25 | 1 | $24 | 2 | $64 | ||||||
| ab49877 | 3 | $84 | 1 | $25 | 1 | $35 | 1 | $24 | |||||||||||||
| west | CA | zj89754 | ab39776 | 5 | $135 | 2 | $42 | 1 | $34 | 2 | $59 | ||||||||||
| ac39776 | 1 | $24 | 1 | $24 | |||||||||||||||||
| east | MD | bz99743 | ca49879 | 3 | $79 | 1 | $25 | 2 | $54 | ||||||||||||
| south | FL | cj89754 | ad49876 | 4 | $105 | 1 | $12 | 1 | $34 | 2 | $59 | ||||||||||
| <missing> | <missing> | hj79754 | ab49876 | 4 | $123 | 1 | $30 | 1 | $34 | 1 | $34 | 1 | $25 | ||||||||
| I am using | 
 
 | ||||||||||||||||||||
It helps to provide example data as a data step as below.
data have;
   input division $ area $ mgr_id $	usr_id $	tranamt :comma10.	reason $	reason_type:$10. 	status $ ;
   format tranamt dollar8.2;
datalines;
north 	MA 	zz89744 	aa49876 	$34.00 	gw 	system 	Y
north 	MA 	zz89744 	aa49876 	$12.00 	be 	system 	Y
north 	MA 	zz89744 	aa49876 	$34.00 	fd 	escalation 	N
north 	MA 	zz89744 	aa49876 	$25.00 	og 	system 	Y
north 	MA 	zz89744 	aa49876 	$30.00 	gw 	system 	Y
north 	MA 	zz89744 	aa49876 	$24.00 	gw 	escalation 	Y
north 	MA 	zz89744 	ab49877 	$25.00 	og 	system 	Y
north 	MA 	zz89744 	ab49877 	$35.00 	gw 	escalation 	Y
north 	MA 	zz89744 	ab49877 	$24.00 	gw 	system 	N
west 	CA 	zj89754 	ab39776 	$34.00 	og 	escalation 	Y
west 	CA 	zj89754 	ab39776 	$12.00 	be 	system 	Y
west 	CA 	zj89754 	ab39776 	$34.00 	fd 	system 	N
west 	CA 	zj89754 	ab39776 	$25.00 	og 	escalation 	Y
west 	CA 	zj89754 	ab39776 	$30.00 	be 	system 	Y
west 	CA 	zj89754 	ac39776 	$24.00 	be 	escalation 	N
east 	MD 	bz99743 	ca49879 	$25.00 	og 	system 	Y
east 	MD 	bz99743 	ca49879 	$30.00 	gw 	system 	Y
east 	MD 	bz99743 	ca49879 	$24.00 	gw 	system 	Y
south 	FL 	cj89754 	ad49876 	$34.00 	og 	escalation 	Y
south 	FL 	cj89754 	ad59876 	$12.00 	be 	system 	Y
south 	FL 	cj89754 	ad59876 	$34.00 	fd 	system 	N
south 	FL 	cj89754 	ad59876 	$25.00 	og 	escalation 	Y
  .	 . 	hj79754 	ab49876 	$34.00 	og 	system 	Y
  .	 . 	hj79754 	ab49876 	$30.00 	be 	system 	Y
  .	 . 	hj79754 	ab49876 	$34.00 	fd 	system 	Y
  .	 . 	hj79754 	ab49876 	$25.00 	gw 	escalation 	Y
;
Proc tabulate data=have;
   class division	area /missing;
   class mgr_id  	usr_id    reason reason_type;
   var tranamt;
   table division*area*mgr_id*usr_id,
       (all='Total' reason=' '*reason_type=' ')*(tranamt=' '*(n="# of trans" sum="total of Trans"*f=dollar8.))
       /misstext=' '
   ;
run;
Proc tabulate is my preferred reporting tool when variables have to "nest" in multiple directions.
The key parts in tabulate above:
Class statements to tell how which variables will be used for "groups". Since you intended to explicitly include missing values for two variables (thank you for indicating that) you use the Class statement option to indicate that with the /MISSING option. Otherwise tabulate will drop any record where a class variable is missing.
Then the VAR statement indicates which variable(s) will be used in numeric calculations such as sum, mean, stddev etc. Anything can be counted (the N statistic) but to do the sum the variable must be on a VAR statement.
The * between variables says to nest the values. The comma is used to separate dimensions. A two dimension table has one comma between the row and the columns. You can create a "3 dimension" with the third dimension as page as the first table dimension and then 2 commas. (Exercise for the interested reader: replace the first * after district with a comma to see this behavior). Groups of related variables or statistics are placed in ( ).
So (all='Total' reason=' '*reason_type=' ') uses a special "group" keyword ALL that says we want to summarize all records and then the nested pairs of Reason and Reason_type will be have the same. After the * on that line the Tranamt variable is used with two grouped statistics n and sum.
The text in quotes after a variable or statistics provides a label in the table. Blank text as in Reason=' ' suppresses any label and then we provide overrides for N and Sum to show specific text.
The / after the dimensions provide some table options. In this case misstext = ' ' means that nothing is shown when a value is missing. You can provide more text than the default system missing option that allows for one character but if you have missing values in row or column headers the results may look odd.
You can use the "ALL" in different locations of the code to create different summary groups.
Proc Tabulate also allows multiple table statements so you can create lots of output from one procedure call.
With the hints above do you think you could get a ROW with a summary of all the records?
There are enough options with Proc Tabulate that there is an actual book published with examples.
It helps to provide example data as a data step as below.
data have;
   input division $ area $ mgr_id $	usr_id $	tranamt :comma10.	reason $	reason_type:$10. 	status $ ;
   format tranamt dollar8.2;
datalines;
north 	MA 	zz89744 	aa49876 	$34.00 	gw 	system 	Y
north 	MA 	zz89744 	aa49876 	$12.00 	be 	system 	Y
north 	MA 	zz89744 	aa49876 	$34.00 	fd 	escalation 	N
north 	MA 	zz89744 	aa49876 	$25.00 	og 	system 	Y
north 	MA 	zz89744 	aa49876 	$30.00 	gw 	system 	Y
north 	MA 	zz89744 	aa49876 	$24.00 	gw 	escalation 	Y
north 	MA 	zz89744 	ab49877 	$25.00 	og 	system 	Y
north 	MA 	zz89744 	ab49877 	$35.00 	gw 	escalation 	Y
north 	MA 	zz89744 	ab49877 	$24.00 	gw 	system 	N
west 	CA 	zj89754 	ab39776 	$34.00 	og 	escalation 	Y
west 	CA 	zj89754 	ab39776 	$12.00 	be 	system 	Y
west 	CA 	zj89754 	ab39776 	$34.00 	fd 	system 	N
west 	CA 	zj89754 	ab39776 	$25.00 	og 	escalation 	Y
west 	CA 	zj89754 	ab39776 	$30.00 	be 	system 	Y
west 	CA 	zj89754 	ac39776 	$24.00 	be 	escalation 	N
east 	MD 	bz99743 	ca49879 	$25.00 	og 	system 	Y
east 	MD 	bz99743 	ca49879 	$30.00 	gw 	system 	Y
east 	MD 	bz99743 	ca49879 	$24.00 	gw 	system 	Y
south 	FL 	cj89754 	ad49876 	$34.00 	og 	escalation 	Y
south 	FL 	cj89754 	ad59876 	$12.00 	be 	system 	Y
south 	FL 	cj89754 	ad59876 	$34.00 	fd 	system 	N
south 	FL 	cj89754 	ad59876 	$25.00 	og 	escalation 	Y
  .	 . 	hj79754 	ab49876 	$34.00 	og 	system 	Y
  .	 . 	hj79754 	ab49876 	$30.00 	be 	system 	Y
  .	 . 	hj79754 	ab49876 	$34.00 	fd 	system 	Y
  .	 . 	hj79754 	ab49876 	$25.00 	gw 	escalation 	Y
;
Proc tabulate data=have;
   class division	area /missing;
   class mgr_id  	usr_id    reason reason_type;
   var tranamt;
   table division*area*mgr_id*usr_id,
       (all='Total' reason=' '*reason_type=' ')*(tranamt=' '*(n="# of trans" sum="total of Trans"*f=dollar8.))
       /misstext=' '
   ;
run;
Proc tabulate is my preferred reporting tool when variables have to "nest" in multiple directions.
The key parts in tabulate above:
Class statements to tell how which variables will be used for "groups". Since you intended to explicitly include missing values for two variables (thank you for indicating that) you use the Class statement option to indicate that with the /MISSING option. Otherwise tabulate will drop any record where a class variable is missing.
Then the VAR statement indicates which variable(s) will be used in numeric calculations such as sum, mean, stddev etc. Anything can be counted (the N statistic) but to do the sum the variable must be on a VAR statement.
The * between variables says to nest the values. The comma is used to separate dimensions. A two dimension table has one comma between the row and the columns. You can create a "3 dimension" with the third dimension as page as the first table dimension and then 2 commas. (Exercise for the interested reader: replace the first * after district with a comma to see this behavior). Groups of related variables or statistics are placed in ( ).
So (all='Total' reason=' '*reason_type=' ') uses a special "group" keyword ALL that says we want to summarize all records and then the nested pairs of Reason and Reason_type will be have the same. After the * on that line the Tranamt variable is used with two grouped statistics n and sum.
The text in quotes after a variable or statistics provides a label in the table. Blank text as in Reason=' ' suppresses any label and then we provide overrides for N and Sum to show specific text.
The / after the dimensions provide some table options. In this case misstext = ' ' means that nothing is shown when a value is missing. You can provide more text than the default system missing option that allows for one character but if you have missing values in row or column headers the results may look odd.
You can use the "ALL" in different locations of the code to create different summary groups.
Proc Tabulate also allows multiple table statements so you can create lots of output from one procedure call.
With the hints above do you think you could get a ROW with a summary of all the records?
There are enough options with Proc Tabulate that there is an actual book published with examples.
ballardw Thank you very much for taking the time to help me with this and give me the explanation of how Proc Tabulate works.
It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.
