Re: By Group Processing Help

Mgarret · Posted 09-06-2012 01:34 PM

Hi all--

I've been having a heck of a time trying to understand by group processing – I find it tricky to understand. Any help will be greatly appreciated. Again, I’m trying to understand how this works any additional explanation would be so great.

I am using this equation to calculate scores:

Utilization=Active_Cases+Pending_Cases+(Housing_Subsidy_cases / 2) / Program_Capacity;

I have to generate a score for each month by agency and then by program. Below is the code I have, but I know its not correct. I've been having trouble visualizing what actually happens during these steps

data want;

set have;

by Agency program month;

if first.agency and first.month then do;

Utilization=Active_Cases+Priority_Pending_Cases+(Housing_Subsidy_Only_cases / 2) / Program_Capacity;

end;

do until last.Agency;

end;

run;

Below is a data set. Again, any help is greatly appreciated.

Agency	Program	Month	Active_Cases	Pending_Cases	Housing_Subsidy_cases	Program_Capacity
Agency1	ProgramA	January	26	6	3	40
Agency1	ProgramA	Febuary	27	6	4	40
Agency1	ProgramA	March	28	8	4	40
Agency1	ProgramA	April	26	7	4	40
Agency1	ProgramA	May	30	6	4	40
Agency1	ProgramA	June	32	10	4	40
Agency1	ProgramA	July	38	5	4	50
Agency1	ProgramA	August	37	9	4	50
Agency1	ProgramA	September	38	7	3	50
Agency1	ProgramA	October	37	5	1	50
Agency1	ProgramA	November	42	8	3	60
Agency1	ProgramA	December	44	10	4	60
Agency2	ProgramA	January	26	6	3	40
Agency2	ProgramA	Febuary	27	6	4	40
Agency2	ProgramA	March	28	8	4	40
Agency2	ProgramA	April	26	7	4	40
Agency2	ProgramA	May	30	6	4	40
Agency2	ProgramA	June	32	10	4	40
Agency2	ProgramA	July	38	5	4	50
Agency2	ProgramA	August	37	9	4	50
Agency2	ProgramA	September	38	7	3	50
Agency2	ProgramA	October	37	5	1	50
Agency2	ProgramA	November	42	8	3	60
Agency2	ProgramA	December	44	10	4	60
Agency2	ProgramB	January	26	6	3	40
Agency2	ProgramB	Febuary	27	6	4	40
Agency2	ProgramB	March	28	8	4	40
Agency2	ProgramB	April	26	7	4	40
Agency2	ProgramB	May	30	6	4	40
Agency2	ProgramB	June	32	10	4	40
Agency2	ProgramB	July	38	5	4	50
Agency2	ProgramB	August	37	9	4	50
Agency2	ProgramB	September	38	7	3	50
Agency2	ProgramB	October	37	5	1	50
Agency2	ProgramB	November	42	8	3	60
Agency2	ProgramB	December	44	10	4	60

Astounding · Posted 09-06-2012 01:52 PM

mGarret,

You're right to try to tackle this. In the long run, BY group processing is an essential skill. Here is a test program that may give you a feel for these tools. It won't get you all the answers, but it should move you forward if you run it and inspect the results.

data test;

set have;
by agency program month;

first_agency = first.agency;

last_agency = last.agency;

first_program = first.program;

last_program = last.program;

first_month = first.month;

last_month = last.month;

if first.agency then agency_total=0;

if first.program then program_total=0;

retain agency_total program_total;

agency_total = agency_total + active_cases;

program_total = program_total + active_cases;

run;

proc print;

run;

Take a look at the outcomes, and see what questions come to mind.

Good luck.

Mgarret · Posted 09-06-2012 02:29 PM

Hi Astounding---

Thank you for your help. If you could, would you mind explaining whats going on in the steps below? Thanks!

if first.agency then agency_total=0;

if first.program then program_total=0;

retain agency_total program_total;

agency_total = agency_total + active_cases;

program_total = program_total + active_cases;

run;

art297 · Posted 09-06-2012 02:40 PM

I think Astounding was thinking that if you ran the code, and looked at the output, you would understand.

I'd suggest taking a quick read of: http://www.pauldickman.com/teaching/sas/set_by.php

and then seeing if you can answer your own question. If not, just post any followup questions you might have.

Astounding · Posted 09-06-2012 02:40 PM

Regarding the first two statements ... if you inspect the results, you will notice that first.agency and first.program are always 1 or 0. More specifically, first.agency is 1 when hitting the first observation for an AGENCY, otherwise it is 0. An IF condition interprets 1 as true, and 0 as false. So the first statement re-sets agency_total to 0 whenever hitting the first observation for an agency.

RETAIN has a standard meaning in SAS: as you move from one observation to the next, let this variable hold onto its value. Without it, AGENCY_TOTAL would be missing at the beginning of processing each observation.

The last two assignment statements increase the value of AGENCY_TOTAL and PROGRAM_TOTAL, by adding ACTIVE_CASES to the current value.

Work calls, but I will get around to further questions if you have them!

Amir · Posted 09-06-2012 01:58 PM

Hi,

Do you want to have an output table with the same columns and number of rows as your input just with an additional column "Utilization" with a new value calculated for every row? If yes, then you should be able to achieve this without by processing:

data want;

set have;

Utilization=Active_Cases+Priority_Pending_Cases+(Housing_Subsidy_Only_cases / 2) / Program_Capacity;

run;

If not, then please provide some sample output.

Regards,

Amir.

ballardw · Posted 09-06-2012 03:58 PM

I may be misunderstanding but it looks like this could be approached with two steps:

1: Generate the cummulative totals of the variable used to create the score

2: Generate the score.

One approach might be:

instead of trying to do a lot of data step code to get the sums of the variables use a proc designed for that task.

Proc summary data=have nway; /* the NWAY option restricts output so you don't get summaries at each level of combination of the class variables */

class agency program month; /* you could use a by statement instead but this avoids a need to sort */

var Active_Cases Priority_Pending_Cases Housing_Subsidy_Only_cases Program_Capacity;

output out=temp sum=;

run;

data want;

set temp;

Utilization=Active_Cases+Priority_Pending_Cases+(Housing_Subsidy_Only_cases / 2) / Program_Capacity;

run;

MikeZdeb · Posted 09-06-2012 04:49 PM

Hi ... I had a similar idea ... why not just use SUMMARY then a data step. One issue that arose is the MONTH is a literal rather than a number and a CLASS statement puts the months in alpha order rather than chronological order (you'd have to muck around with the data add month as 1, 2, 3 etc) ....

proc summary data=x;

class agency program month;

types agency*month program;

var active_cases pending_cases housing_subsidy_cases program_capacity;

output out=stats (drop=_:) sum=;

run;

data y;

set stats;

utilization=active_cases+pending_cases+(housing_subsidy_cases / 2) / program_capacity;

run;

agency program month cases cases cases capacity utilization

ProgramA 810 174 84 1120 984.038

ProgramB 405 87 42 560 492.038

Agency1 April 26 7 4 40 33.050

Agency1 August 37 9 4 50 46.040

Agency1 December 44 10 4 60 54.033

Agency1 Febuary 27 6 4 40 33.050

Agency1 January 26 6 3 40 32.038

Agency1 July 38 5 4 50 43.040

Agency1 June 32 10 4 40 42.050

Agency1 March 28 8 4 40 36.050

Agency1 May 30 6 4 40 36.050

Agency1 November 42 8 3 60 50.025

Agency1 October 37 5 1 50 42.010

Agency1 September 38 7 3 50 45.030

Agency2 April 52 14 8 80 66.050

Agency2 August 74 18 8 100 92.040

Agency2 December 88 20 8 120 108.033

Agency2 Febuary 54 12 8 80 66.050

Agency2 January 52 12 6 80 64.038

Agency2 July 76 10 8 100 86.040

Agency2 June 64 20 8 80 84.050

Agency2 March 56 16 8 80 72.050

Agency2 May 60 12 8 80 72.050

Agency2 November 84 16 6 120 100.025

Agency2 October 74 10 2 100 84.010

Agency2 September 76 14 6 100 90.030

Catch up on SAS Innovate 2026

SAS Training: Just a Click Away