Hi all--
I've been having a heck of a time trying to understand by group processing – I find it tricky to understand. Any help will be greatly appreciated. Again, I’m trying to understand how this works any additional explanation would be so great.
I am using this equation to calculate scores:
Utilization=Active_Cases+Pending_Cases+(Housing_Subsidy_cases / 2) / Program_Capacity;
I have to generate a score for each month by agency and then by program. Below is the code I have, but I know its not correct. I've been having trouble visualizing what actually happens during these steps
data want;
set have;
by Agency program month;
if first.agency and first.month then do;
Utilization=Active_Cases+Priority_Pending_Cases+(Housing_Subsidy_Only_cases / 2) / Program_Capacity;
end;
do until last.Agency;
end;
run;
Below is a data set. Again, any help is greatly appreciated.
Agency | Program | Month | Active_Cases | Pending_Cases | Housing_Subsidy_cases | Program_Capacity |
Agency1 | ProgramA | January | 26 | 6 | 3 | 40 |
Agency1 | ProgramA | Febuary | 27 | 6 | 4 | 40 |
Agency1 | ProgramA | March | 28 | 8 | 4 | 40 |
Agency1 | ProgramA | April | 26 | 7 | 4 | 40 |
Agency1 | ProgramA | May | 30 | 6 | 4 | 40 |
Agency1 | ProgramA | June | 32 | 10 | 4 | 40 |
Agency1 | ProgramA | July | 38 | 5 | 4 | 50 |
Agency1 | ProgramA | August | 37 | 9 | 4 | 50 |
Agency1 | ProgramA | September | 38 | 7 | 3 | 50 |
Agency1 | ProgramA | October | 37 | 5 | 1 | 50 |
Agency1 | ProgramA | November | 42 | 8 | 3 | 60 |
Agency1 | ProgramA | December | 44 | 10 | 4 | 60 |
Agency2 | ProgramA | January | 26 | 6 | 3 | 40 |
Agency2 | ProgramA | Febuary | 27 | 6 | 4 | 40 |
Agency2 | ProgramA | March | 28 | 8 | 4 | 40 |
Agency2 | ProgramA | April | 26 | 7 | 4 | 40 |
Agency2 | ProgramA | May | 30 | 6 | 4 | 40 |
Agency2 | ProgramA | June | 32 | 10 | 4 | 40 |
Agency2 | ProgramA | July | 38 | 5 | 4 | 50 |
Agency2 | ProgramA | August | 37 | 9 | 4 | 50 |
Agency2 | ProgramA | September | 38 | 7 | 3 | 50 |
Agency2 | ProgramA | October | 37 | 5 | 1 | 50 |
Agency2 | ProgramA | November | 42 | 8 | 3 | 60 |
Agency2 | ProgramA | December | 44 | 10 | 4 | 60 |
Agency2 | ProgramB | January | 26 | 6 | 3 | 40 |
Agency2 | ProgramB | Febuary | 27 | 6 | 4 | 40 |
Agency2 | ProgramB | March | 28 | 8 | 4 | 40 |
Agency2 | ProgramB | April | 26 | 7 | 4 | 40 |
Agency2 | ProgramB | May | 30 | 6 | 4 | 40 |
Agency2 | ProgramB | June | 32 | 10 | 4 | 40 |
Agency2 | ProgramB | July | 38 | 5 | 4 | 50 |
Agency2 | ProgramB | August | 37 | 9 | 4 | 50 |
Agency2 | ProgramB | September | 38 | 7 | 3 | 50 |
Agency2 | ProgramB | October | 37 | 5 | 1 | 50 |
Agency2 | ProgramB | November | 42 | 8 | 3 | 60 |
Agency2 | ProgramB | December | 44 | 10 | 4 | 60 |
mGarret,
You're right to try to tackle this. In the long run, BY group processing is an essential skill. Here is a test program that may give you a feel for these tools. It won't get you all the answers, but it should move you forward if you run it and inspect the results.
data test;
set have;
by agency program month;
first_agency = first.agency;
last_agency = last.agency;
first_program = first.program;
last_program = last.program;
first_month = first.month;
last_month = last.month;
if first.agency then agency_total=0;
if first.program then program_total=0;
retain agency_total program_total;
agency_total = agency_total + active_cases;
program_total = program_total + active_cases;
run;
proc print;
run;
Take a look at the outcomes, and see what questions come to mind.
Good luck.
Hi Astounding---
Thank you for your help. If you could, would you mind explaining whats going on in the steps below? Thanks!
if first.agency then agency_total=0;
if first.program then program_total=0;
retain agency_total program_total;
agency_total = agency_total + active_cases;
program_total = program_total + active_cases;
run;
I think Astounding was thinking that if you ran the code, and looked at the output, you would understand.
I'd suggest taking a quick read of: http://www.pauldickman.com/teaching/sas/set_by.php
and then seeing if you can answer your own question. If not, just post any followup questions you might have.
Regarding the first two statements ... if you inspect the results, you will notice that first.agency and first.program are always 1 or 0. More specifically, first.agency is 1 when hitting the first observation for an AGENCY, otherwise it is 0. An IF condition interprets 1 as true, and 0 as false. So the first statement re-sets agency_total to 0 whenever hitting the first observation for an agency.
RETAIN has a standard meaning in SAS: as you move from one observation to the next, let this variable hold onto its value. Without it, AGENCY_TOTAL would be missing at the beginning of processing each observation.
The last two assignment statements increase the value of AGENCY_TOTAL and PROGRAM_TOTAL, by adding ACTIVE_CASES to the current value.
Work calls, but I will get around to further questions if you have them!
Hi,
Do you want to have an output table with the same columns and number of rows as your input just with an additional column "Utilization" with a new value calculated for every row? If yes, then you should be able to achieve this without by processing:
data want;
set have;
Utilization=Active_Cases+Priority_Pending_Cases+(Housing_Subsidy_Only_cases / 2) / Program_Capacity;
run;
If not, then please provide some sample output.
Regards,
Amir.
I may be misunderstanding but it looks like this could be approached with two steps:
1: Generate the cummulative totals of the variable used to create the score
2: Generate the score.
One approach might be:
instead of trying to do a lot of data step code to get the sums of the variables use a proc designed for that task.
Proc summary data=have nway; /* the NWAY option restricts output so you don't get summaries at each level of combination of the class variables */
class agency program month; /* you could use a by statement instead but this avoids a need to sort */
var Active_Cases Priority_Pending_Cases Housing_Subsidy_Only_cases Program_Capacity;
output out=temp sum=;
run;
data want;
set temp;
Utilization=Active_Cases+Priority_Pending_Cases+(Housing_Subsidy_Only_cases / 2) / Program_Capacity;
run;
Hi ... I had a similar idea ... why not just use SUMMARY then a data step. One issue that arose is the MONTH is a literal rather than a number and a CLASS statement puts the months in alpha order rather than chronological order (you'd have to muck around with the data add month as 1, 2, 3 etc) ....
proc summary data=x;
class agency program month;
types agency*month program;
var active_cases pending_cases housing_subsidy_cases program_capacity;
output out=stats (drop=_:) sum=;
run;
data y;
set stats;
utilization=active_cases+pending_cases+(housing_subsidy_cases / 2) / program_capacity;
run;
agency program month cases cases cases capacity utilization
ProgramA 810 174 84 1120 984.038
ProgramB 405 87 42 560 492.038
Agency1 April 26 7 4 40 33.050
Agency1 August 37 9 4 50 46.040
Agency1 December 44 10 4 60 54.033
Agency1 Febuary 27 6 4 40 33.050
Agency1 January 26 6 3 40 32.038
Agency1 July 38 5 4 50 43.040
Agency1 June 32 10 4 40 42.050
Agency1 March 28 8 4 40 36.050
Agency1 May 30 6 4 40 36.050
Agency1 November 42 8 3 60 50.025
Agency1 October 37 5 1 50 42.010
Agency1 September 38 7 3 50 45.030
Agency2 April 52 14 8 80 66.050
Agency2 August 74 18 8 100 92.040
Agency2 December 88 20 8 120 108.033
Agency2 Febuary 54 12 8 80 66.050
Agency2 January 52 12 6 80 64.038
Agency2 July 76 10 8 100 86.040
Agency2 June 64 20 8 80 84.050
Agency2 March 56 16 8 80 72.050
Agency2 May 60 12 8 80 72.050
Agency2 November 84 16 6 120 100.025
Agency2 October 74 10 2 100 84.010
Agency2 September 76 14 6 100 90.030
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.
Find more tutorials on the SAS Users YouTube channel.