07-07-2016 02:27 PM
I've got a dataset that looks like the one above, and I need an output report that looks like so:
"Number of workdays" is Lag_Workdays used as a classification variable.
"Delayed/Non-delayed" is Priority used as a classification variable
"Total Consultations" is a set of total columns (i.e., Delayed + Non-Delayed)
"2010/2011" is Intake_Year used as a classification variable (the original dataset has 6 years of data, but I've truncated the output report to show only these two years for this example. In the final report I want all 6 years.)
"N" is N (i.e., number of records within those classifications)
"%" is percent N
"cumtot" and "cumpct" are cumulative total and cumulative percent respectively
Obviously PROC Freq creates these types of frequency tables naturally, but due to the number of nested classifications it seemed like I needed something with a little more power. (I know that I can run Proc Freq and create output tables, then splice those tables back together to create the final report, but I was hoping for something a little more sophisticated.)
Then I turned to PROC Tabulate, but proc tab can't do cumulative freqs and percents.
Then I found a paper on how to do cumulative freqs and percents with PROC Report, but that procedure is not my particular forte, so I quickly got stumped. I'd be happy to share the code that I've developed so far, but it's probably a gross bastardization of PROC Report that would embarass any proper programmer.
07-07-2016 03:32 PM - edited 07-07-2016 05:51 PM
Please post the code anyway. Sometimes it turns out to br pretty close and you're only missing one or two bits. As a minimum it gives folks variable names and such.
It would also be ideal to provide a small data set that shows the basic behavior of your data providing basic challenges (2 nests at any level). If you can provide that as a datastep then tested code may be provided.
has an example of code to turn your dataset into a data step.
07-07-2016 04:20 PM
Ok, here's my bastardized code...
proc report data=consults nowd; format priority $del.; where status not in ('07' '08' '09') and intake_date le '31dec2015'd; column lag_workdays intake_year=all,(tk=acount tk=apct ytd cumpct) priority,intake_year,(tk=count tk=pct) ; define all / across 'Total Consultations'; define acount / analysis n format=comma8. 'N'; define apct / analysis pctn format=percent8.2 '%'; define priority / across ' '; define intake_year / across ' '; define lag_workdays / group 'Number of workdays' missing; define count / analysis n format=comma8. 'N'; define pct / analysis pctn format=percent8.2 '%'; define ytd / computed; define cumpct / computed; compute before; total+acount; cumtot=0; endcomp; compute ytd; cumtot+acount; ytd=cumtot; endcomp; compute cumpct; cumpct=ytd/total; endcomp; run;
So far I can get the basic structure of the report to work. The piece that isn't working is the compute blocks. It's all coming out as missing.
I'm guessing that a significant part of my probably is how I'm using the TK field to calculate the N and PCTN columns. It looks like I'm doing that wrong, but I'm not sure what the correct way is to create those columns.
Relatedly, the compute block portion says that "acount" is uninitialized. When I try to change the reference to something like "tk.N" it gives me an error that says that using tk.N is inappropriate.
And here's the paper that I found that gave me the basic structure for calculating the cumulative columns:
As shown in the example above, I'd like the cumulative totals to show up for all sections of the report, not just within the "Total Consultations" portion, but I didn't get that far...