Solved: how to summarize transactional data for report

Elliott · Posted 05-17-2021 06:48 PM

I need some help, I am trying to create a report from transactional data that looks like this:

have
transactional data:
division	area	mgr_id	usr_id	tranamt	reason	reason_type	status
north	MA	zz89744	aa49876	$34.00	gw	system	Y
north	MA	zz89744	aa49876	$12.00	be	system	Y
north	MA	zz89744	aa49876	$34.00	fd	escalation	N
north	MA	zz89744	aa49876	$25.00	og	system	Y
north	MA	zz89744	aa49876	$30.00	gw	system	Y
north	MA	zz89744	aa49876	$24.00	gw	escalation	Y
north	MA	zz89744	ab49877	$25.00	og	system	Y
north	MA	zz89744	ab49877	$35.00	gw	escalation	Y
north	MA	zz89744	ab49877	$24.00	gw	system	N
west	CA	zj89754	ab39776	$34.00	og	escalation	Y
west	CA	zj89754	ab39776	$12.00	be	system	Y
west	CA	zj89754	ab39776	$34.00	fd	system	N
west	CA	zj89754	ab39776	$25.00	og	escalation	Y
west	CA	zj89754	ab39776	$30.00	be	system	Y
west	CA	zj89754	ac39776	$24.00	be	escalation	N
east	MD	bz99743	ca49879	$25.00	og	system	Y
east	MD	bz99743	ca49879	$30.00	gw	system	Y
east	MD	bz99743	ca49879	$24.00	gw	system	Y
south	FL	cj89754	ad49876	$34.00	og	escalation	Y
south	FL	cj89754	ad59876	$12.00	be	system	Y
south	FL	cj89754	ad59876	$34.00	fd	system	N
south	FL	cj89754	ad59876	$25.00	og	escalation	Y
		hj79754	ab49876	$34.00	og	system	Y
		hj79754	ab49876	$30.00	be	system	Y
		hj79754	ab49876	$34.00	fd	system	Y
		hj79754	ab49876	$25.00	gw	escalation	Y

Reporting from above data set which I already have to look like this example below, I am using SAS 9.4 on unix server, developing in SAS EG but will ultimately be running in a scheduler with a functional ID. I use ODS for the output of the report.

any assistance will be greatly appreciated.

I have researched all day and am at a loss as to how to take the above data and make it look like this in an excel output.

Thank you

						group be				group fd				group og				group gw
						escalation		system		escalation		system		escalation		system		escalation		system
division	area	mgr_id	usr_id	# of trans	$ total trans	# of trans	$ total trans	# of trans	$ total trans	# of trans	$ total trans	# of trans	$ total trans	# of trans	$ total trans	# of trans	$ total trans	# of trans	$ total trans	# of trans	$ total trans
north	MA	zz89744	aa49876	6	$159			1	$12	1	$34					1	$25	1	$24	2	$64
			ab49877	3	$84											1	$25	1	$35	1	$24
west	CA	zj89754	ab39776	5	$135			2	$42			1	$34	2	$59
			ac39776	1	$24	1	$24
east	MD	bz99743	ca49879	3	$79											1	$25			2	$54
south	FL	cj89754	ad49876	4	$105			1	$12			1	$34	2	$59
<missing>	<missing>	hj79754	ab49876	4	$123			1	$30			1	$34			1	$34	1	$25

I am using

ballardw · Posted 05-17-2021 07:55 PM

It helps to provide example data as a data step as below.

data have;
   input division $ area $ mgr_id $	usr_id $	tranamt :comma10.	reason $	reason_type:$10. 	status $ ;
   format tranamt dollar8.2;
datalines;
north 	MA 	zz89744 	aa49876 	$34.00 	gw 	system 	Y
north 	MA 	zz89744 	aa49876 	$12.00 	be 	system 	Y
north 	MA 	zz89744 	aa49876 	$34.00 	fd 	escalation 	N
north 	MA 	zz89744 	aa49876 	$25.00 	og 	system 	Y
north 	MA 	zz89744 	aa49876 	$30.00 	gw 	system 	Y
north 	MA 	zz89744 	aa49876 	$24.00 	gw 	escalation 	Y
north 	MA 	zz89744 	ab49877 	$25.00 	og 	system 	Y
north 	MA 	zz89744 	ab49877 	$35.00 	gw 	escalation 	Y
north 	MA 	zz89744 	ab49877 	$24.00 	gw 	system 	N
west 	CA 	zj89754 	ab39776 	$34.00 	og 	escalation 	Y
west 	CA 	zj89754 	ab39776 	$12.00 	be 	system 	Y
west 	CA 	zj89754 	ab39776 	$34.00 	fd 	system 	N
west 	CA 	zj89754 	ab39776 	$25.00 	og 	escalation 	Y
west 	CA 	zj89754 	ab39776 	$30.00 	be 	system 	Y
west 	CA 	zj89754 	ac39776 	$24.00 	be 	escalation 	N
east 	MD 	bz99743 	ca49879 	$25.00 	og 	system 	Y
east 	MD 	bz99743 	ca49879 	$30.00 	gw 	system 	Y
east 	MD 	bz99743 	ca49879 	$24.00 	gw 	system 	Y
south 	FL 	cj89754 	ad49876 	$34.00 	og 	escalation 	Y
south 	FL 	cj89754 	ad59876 	$12.00 	be 	system 	Y
south 	FL 	cj89754 	ad59876 	$34.00 	fd 	system 	N
south 	FL 	cj89754 	ad59876 	$25.00 	og 	escalation 	Y
  .	 . 	hj79754 	ab49876 	$34.00 	og 	system 	Y
  .	 . 	hj79754 	ab49876 	$30.00 	be 	system 	Y
  .	 . 	hj79754 	ab49876 	$34.00 	fd 	system 	Y
  .	 . 	hj79754 	ab49876 	$25.00 	gw 	escalation 	Y
;

Proc tabulate data=have;
   class division	area /missing;
   class mgr_id  	usr_id    reason reason_type;
   var tranamt;
   table division*area*mgr_id*usr_id,
       (all='Total' reason=' '*reason_type=' ')*(tranamt=' '*(n="# of trans" sum="total of Trans"*f=dollar8.))
       /misstext=' '
   ;
run;

Proc tabulate is my preferred reporting tool when variables have to "nest" in multiple directions.

The key parts in tabulate above:

Class statements to tell how which variables will be used for "groups". Since you intended to explicitly include missing values for two variables (thank you for indicating that) you use the Class statement option to indicate that with the /MISSING option. Otherwise tabulate will drop any record where a class variable is missing.

Then the VAR statement indicates which variable(s) will be used in numeric calculations such as sum, mean, stddev etc. Anything can be counted (the N statistic) but to do the sum the variable must be on a VAR statement.

The * between variables says to nest the values. The comma is used to separate dimensions. A two dimension table has one comma between the row and the columns. You can create a "3 dimension" with the third dimension as page as the first table dimension and then 2 commas. (Exercise for the interested reader: replace the first * after district with a comma to see this behavior). Groups of related variables or statistics are placed in ( ).

So (all='Total' reason=' '*reason_type=' ') uses a special "group" keyword ALL that says we want to summarize all records and then the nested pairs of Reason and Reason_type will be have the same. After the * on that line the Tranamt variable is used with two grouped statistics n and sum.

The text in quotes after a variable or statistics provides a label in the table. Blank text as in Reason=' ' suppresses any label and then we provide overrides for N and Sum to show specific text.

The / after the dimensions provide some table options. In this case misstext = ' ' means that nothing is shown when a value is missing. You can provide more text than the default system missing option that allows for one character but if you have missing values in row or column headers the results may look odd.

You can use the "ALL" in different locations of the code to create different summary groups.

Proc Tabulate also allows multiple table statements so you can create lots of output from one procedure call.

With the hints above do you think you could get a ROW with a summary of all the records?

There are enough options with Proc Tabulate that there is an actual book published with examples.

View solution in original post

ballardw · Posted 05-17-2021 07:55 PM

It helps to provide example data as a data step as below.

data have;
   input division $ area $ mgr_id $	usr_id $	tranamt :comma10.	reason $	reason_type:$10. 	status $ ;
   format tranamt dollar8.2;
datalines;
north 	MA 	zz89744 	aa49876 	$34.00 	gw 	system 	Y
north 	MA 	zz89744 	aa49876 	$12.00 	be 	system 	Y
north 	MA 	zz89744 	aa49876 	$34.00 	fd 	escalation 	N
north 	MA 	zz89744 	aa49876 	$25.00 	og 	system 	Y
north 	MA 	zz89744 	aa49876 	$30.00 	gw 	system 	Y
north 	MA 	zz89744 	aa49876 	$24.00 	gw 	escalation 	Y
north 	MA 	zz89744 	ab49877 	$25.00 	og 	system 	Y
north 	MA 	zz89744 	ab49877 	$35.00 	gw 	escalation 	Y
north 	MA 	zz89744 	ab49877 	$24.00 	gw 	system 	N
west 	CA 	zj89754 	ab39776 	$34.00 	og 	escalation 	Y
west 	CA 	zj89754 	ab39776 	$12.00 	be 	system 	Y
west 	CA 	zj89754 	ab39776 	$34.00 	fd 	system 	N
west 	CA 	zj89754 	ab39776 	$25.00 	og 	escalation 	Y
west 	CA 	zj89754 	ab39776 	$30.00 	be 	system 	Y
west 	CA 	zj89754 	ac39776 	$24.00 	be 	escalation 	N
east 	MD 	bz99743 	ca49879 	$25.00 	og 	system 	Y
east 	MD 	bz99743 	ca49879 	$30.00 	gw 	system 	Y
east 	MD 	bz99743 	ca49879 	$24.00 	gw 	system 	Y
south 	FL 	cj89754 	ad49876 	$34.00 	og 	escalation 	Y
south 	FL 	cj89754 	ad59876 	$12.00 	be 	system 	Y
south 	FL 	cj89754 	ad59876 	$34.00 	fd 	system 	N
south 	FL 	cj89754 	ad59876 	$25.00 	og 	escalation 	Y
  .	 . 	hj79754 	ab49876 	$34.00 	og 	system 	Y
  .	 . 	hj79754 	ab49876 	$30.00 	be 	system 	Y
  .	 . 	hj79754 	ab49876 	$34.00 	fd 	system 	Y
  .	 . 	hj79754 	ab49876 	$25.00 	gw 	escalation 	Y
;

Proc tabulate data=have;
   class division	area /missing;
   class mgr_id  	usr_id    reason reason_type;
   var tranamt;
   table division*area*mgr_id*usr_id,
       (all='Total' reason=' '*reason_type=' ')*(tranamt=' '*(n="# of trans" sum="total of Trans"*f=dollar8.))
       /misstext=' '
   ;
run;