Help using Base SAS procedures

By Group Processing Help

Reply
Frequent Contributor
Posts: 142

By Group Processing Help

Hi all--

I've been having a heck of a time trying to understand by group processing – I find it tricky to understand. Any help will be greatly appreciated. Again, I’m trying to understand how this works any additional explanation would be so great.

I am using this equation to calculate scores:

Utilization=Active_Cases+Pending_Cases+(Housing_Subsidy_cases / 2) / Program_Capacity;

I have to generate a score for each month by agency and then by program. Below is the code I have, but I know its not correct. I've been having trouble visualizing what actually happens during these steps    

data want;

set have;

by Agency program  month;

if first.agency and first.month   then do;

Utilization=Active_Cases+Priority_Pending_Cases+(Housing_Subsidy_Only_cases / 2) / Program_Capacity;

end;

do until last.Agency;

end;

run;

Below is a data set. Again, any help is greatly appreciated.

Agency Program Month Active_CasesPending_CasesHousing_Subsidy_casesProgram_Capacity
Agency1ProgramAJanuary 266340
Agency1ProgramAFebuary 276440
Agency1ProgramAMarch 288440
Agency1ProgramAApril267440
Agency1ProgramAMay 306440
Agency1ProgramAJune3210440
Agency1ProgramAJuly 385450
Agency1ProgramAAugust 379450
Agency1ProgramASeptember 387350
Agency1ProgramAOctober 375150
Agency1ProgramANovember 428360
Agency1ProgramADecember 4410460
Agency2ProgramAJanuary 266340
Agency2ProgramAFebuary 276440
Agency2ProgramAMarch 288440
Agency2ProgramAApril267440
Agency2ProgramAMay 306440
Agency2ProgramAJune3210440
Agency2ProgramAJuly 385450
Agency2ProgramAAugust 379450
Agency2ProgramASeptember 387350
Agency2ProgramAOctober 375150
Agency2ProgramANovember 428360
Agency2ProgramADecember 4410460
Agency2ProgramBJanuary 266340
Agency2ProgramBFebuary 276440
Agency2ProgramBMarch 288440
Agency2ProgramBApril267440
Agency2ProgramBMay 306440
Agency2ProgramBJune3210440
Agency2ProgramBJuly 385450
Agency2ProgramBAugust 379450
Agency2ProgramBSeptember 387350
Agency2ProgramBOctober 375150
Agency2ProgramBNovember 428360
Agency2ProgramBDecember 4410460
Super User
Posts: 5,081

Re: By Group Processing Help

mGarret,

You're right to try to tackle this.  In the long run, BY group processing is an essential skill.  Here is a test program that may give you a feel for these tools.  It won't get you all the answers, but it should move you forward if you run it and inspect the results.

data test;

   set have;
   by agency program month;

   first_agency = first.agency;

   last_agency = last.agency;

   first_program = first.program;

   last_program = last.program;

   first_month = first.month;

   last_month = last.month;

   if first.agency then agency_total=0;

   if first.program then program_total=0;

   retain agency_total program_total;

   agency_total = agency_total + active_cases;

   program_total = program_total + active_cases;

run;

proc print;

run;

Take a look at the outcomes, and see what questions come to mind.

Good luck.

Frequent Contributor
Posts: 142

Re: By Group Processing Help

Hi Astounding---

Thank you for your help. If you could, would you mind explaining whats going on in the steps below? Thanks! 

   if first.agency then agency_total=0;

   if first.program then program_total=0;

   retain agency_total program_total;

   agency_total = agency_total + active_cases;

   program_total = program_total + active_cases;

run;

PROC Star
Posts: 7,363

Re: By Group Processing Help

I think Astounding was thinking that if you ran the code, and looked at the output, you would understand.

I'd suggest taking a quick read of: http://www.pauldickman.com/teaching/sas/set_by.php

and then seeing if you can answer your own question.  If not, just post any followup questions you might have.

Super User
Posts: 5,081

Re: By Group Processing Help

Regarding the first two statements ... if you inspect the results, you will notice that first.agency and first.program are always 1 or 0.  More specifically, first.agency is 1 when hitting the first observation for an AGENCY, otherwise it is 0.  An IF condition interprets 1 as true, and 0 as false.  So the first statement re-sets agency_total to 0 whenever hitting the first observation for an agency.


RETAIN has a standard meaning in SAS:  as you move from one observation to the next, let this variable hold onto its value.  Without it, AGENCY_TOTAL would be missing at the beginning of processing each observation.

The last two assignment statements increase the value of AGENCY_TOTAL and PROGRAM_TOTAL, by adding ACTIVE_CASES to the current value.

Work calls, but I will get around to further questions if you have them!

Super Contributor
Posts: 282

Re: By Group Processing Help

Hi,

Do you want to have an output table with the same columns and number of rows as your input just with an additional column "Utilization" with a new value calculated for every row? If yes, then you should be able to achieve this without by processing:

data want;

  set have;

  Utilization=Active_Cases+Priority_Pending_Cases+(Housing_Subsidy_Only_cases / 2) / Program_Capacity;

run;

If not, then please provide some sample output.

Regards,

Amir.

Super User
Posts: 10,500

Re: By Group Processing Help

I may be misunderstanding but it looks like this could be approached with two steps:

1: Generate the cummulative totals of the variable used to create the score

2: Generate the score.

One approach might be:

instead of trying to do a lot of data step code to get the sums of the variables use a proc designed for that task.

Proc summary data=have nway; /* the NWAY option restricts output so you don't get summaries at each level of combination of the class variables */

     class agency program month; /* you could use a by statement instead but this avoids a need to sort */

     var  Active_Cases Priority_Pending_Cases Housing_Subsidy_Only_cases Program_Capacity;

  output out=temp sum=;

run;

data want;

     set temp;

     Utilization=Active_Cases+Priority_Pending_Cases+(Housing_Subsidy_Only_cases / 2) / Program_Capacity;

run;


Valued Guide
Posts: 765

Re: By Group Processing Help

Hi ... I had a similar idea ... why not just use SUMMARY then a data step.  One issue that arose is the MONTH is a literal rather than a number and a CLASS statement puts the months in alpha order rather than chronological order (you'd have to muck around with the data add month as 1, 2, 3 etc) ....

proc summary data=x;

class agency program month;

types agency*month program;

var active_cases pending_cases housing_subsidy_cases program_capacity;

output out=stats (drop=_Smiley Happy sum=;

run;

data y;

set stats;

utilization=active_cases+pending_cases+(housing_subsidy_cases / 2) / program_capacity;

run;

agency     program     month         cases       cases       cases     capacity    utilization

           ProgramA                   810         174         84         1120        984.038

           ProgramB                   405          87         42          560        492.038

Agency1                April           26           7          4           40         33.050

Agency1                August          37           9          4           50         46.040

Agency1                December        44          10          4           60         54.033

Agency1                Febuary         27           6          4           40         33.050

Agency1                January         26           6          3           40         32.038

Agency1                July            38           5          4           50         43.040

Agency1                June            32          10          4           40         42.050

Agency1                March           28           8          4           40         36.050

Agency1                May             30           6          4           40         36.050

Agency1                November        42           8          3           60         50.025

Agency1                October         37           5          1           50         42.010

Agency1                September       38           7          3           50         45.030

Agency2                April           52          14          8           80         66.050

Agency2                August          74          18          8          100         92.040

Agency2                December        88          20          8          120        108.033

Agency2                Febuary         54          12          8           80         66.050

Agency2                January         52          12          6           80         64.038

Agency2                July            76          10          8          100         86.040

Agency2                June            64          20          8           80         84.050

Agency2                March           56          16          8           80         72.050

Agency2                May             60          12          8           80         72.050

Agency2                November        84          16          6          120        100.025

Agency2                October         74          10          2          100         84.010

Agency2                September       76          14          6          100         90.030

Ask a Question
Discussion stats
  • 7 replies
  • 219 views
  • 0 likes
  • 6 in conversation