BookmarkSubscribeRSS Feed
Mgarret
Obsidian | Level 7

Hi all--

I've been having a heck of a time trying to understand by group processing – I find it tricky to understand. Any help will be greatly appreciated. Again, I’m trying to understand how this works any additional explanation would be so great.

I am using this equation to calculate scores:

Utilization=Active_Cases+Pending_Cases+(Housing_Subsidy_cases / 2) / Program_Capacity;

I have to generate a score for each month by agency and then by program. Below is the code I have, but I know its not correct. I've been having trouble visualizing what actually happens during these steps    

data want;

set have;

by Agency program  month;

if first.agency and first.month   then do;

Utilization=Active_Cases+Priority_Pending_Cases+(Housing_Subsidy_Only_cases / 2) / Program_Capacity;

end;

do until last.Agency;

end;

run;

Below is a data set. Again, any help is greatly appreciated.

Agency Program Month Active_CasesPending_CasesHousing_Subsidy_casesProgram_Capacity
Agency1ProgramAJanuary 266340
Agency1ProgramAFebuary 276440
Agency1ProgramAMarch 288440
Agency1ProgramAApril267440
Agency1ProgramAMay 306440
Agency1ProgramAJune3210440
Agency1ProgramAJuly 385450
Agency1ProgramAAugust 379450
Agency1ProgramASeptember 387350
Agency1ProgramAOctober 375150
Agency1ProgramANovember 428360
Agency1ProgramADecember 4410460
Agency2ProgramAJanuary 266340
Agency2ProgramAFebuary 276440
Agency2ProgramAMarch 288440
Agency2ProgramAApril267440
Agency2ProgramAMay 306440
Agency2ProgramAJune3210440
Agency2ProgramAJuly 385450
Agency2ProgramAAugust 379450
Agency2ProgramASeptember 387350
Agency2ProgramAOctober 375150
Agency2ProgramANovember 428360
Agency2ProgramADecember 4410460
Agency2ProgramBJanuary 266340
Agency2ProgramBFebuary 276440
Agency2ProgramBMarch 288440
Agency2ProgramBApril267440
Agency2ProgramBMay 306440
Agency2ProgramBJune3210440
Agency2ProgramBJuly 385450
Agency2ProgramBAugust 379450
Agency2ProgramBSeptember 387350
Agency2ProgramBOctober 375150
Agency2ProgramBNovember 428360
Agency2ProgramBDecember 4410460
7 REPLIES 7
Astounding
PROC Star

mGarret,

You're right to try to tackle this.  In the long run, BY group processing is an essential skill.  Here is a test program that may give you a feel for these tools.  It won't get you all the answers, but it should move you forward if you run it and inspect the results.

data test;

   set have;
   by agency program month;

   first_agency = first.agency;

   last_agency = last.agency;

   first_program = first.program;

   last_program = last.program;

   first_month = first.month;

   last_month = last.month;

   if first.agency then agency_total=0;

   if first.program then program_total=0;

   retain agency_total program_total;

   agency_total = agency_total + active_cases;

   program_total = program_total + active_cases;

run;

proc print;

run;

Take a look at the outcomes, and see what questions come to mind.

Good luck.

Mgarret
Obsidian | Level 7

Hi Astounding---

Thank you for your help. If you could, would you mind explaining whats going on in the steps below? Thanks! 

   if first.agency then agency_total=0;

   if first.program then program_total=0;

   retain agency_total program_total;

   agency_total = agency_total + active_cases;

   program_total = program_total + active_cases;

run;

art297
Opal | Level 21

I think Astounding was thinking that if you ran the code, and looked at the output, you would understand.

I'd suggest taking a quick read of: http://www.pauldickman.com/teaching/sas/set_by.php

and then seeing if you can answer your own question.  If not, just post any followup questions you might have.

Astounding
PROC Star

Regarding the first two statements ... if you inspect the results, you will notice that first.agency and first.program are always 1 or 0.  More specifically, first.agency is 1 when hitting the first observation for an AGENCY, otherwise it is 0.  An IF condition interprets 1 as true, and 0 as false.  So the first statement re-sets agency_total to 0 whenever hitting the first observation for an agency.


RETAIN has a standard meaning in SAS:  as you move from one observation to the next, let this variable hold onto its value.  Without it, AGENCY_TOTAL would be missing at the beginning of processing each observation.

The last two assignment statements increase the value of AGENCY_TOTAL and PROGRAM_TOTAL, by adding ACTIVE_CASES to the current value.

Work calls, but I will get around to further questions if you have them!

Amir
PROC Star

Hi,

Do you want to have an output table with the same columns and number of rows as your input just with an additional column "Utilization" with a new value calculated for every row? If yes, then you should be able to achieve this without by processing:

data want;

  set have;

  Utilization=Active_Cases+Priority_Pending_Cases+(Housing_Subsidy_Only_cases / 2) / Program_Capacity;

run;

If not, then please provide some sample output.

Regards,

Amir.

ballardw
Super User

I may be misunderstanding but it looks like this could be approached with two steps:

1: Generate the cummulative totals of the variable used to create the score

2: Generate the score.

One approach might be:

instead of trying to do a lot of data step code to get the sums of the variables use a proc designed for that task.

Proc summary data=have nway; /* the NWAY option restricts output so you don't get summaries at each level of combination of the class variables */

     class agency program month; /* you could use a by statement instead but this avoids a need to sort */

     var  Active_Cases Priority_Pending_Cases Housing_Subsidy_Only_cases Program_Capacity;

  output out=temp sum=;

run;

data want;

     set temp;

     Utilization=Active_Cases+Priority_Pending_Cases+(Housing_Subsidy_Only_cases / 2) / Program_Capacity;

run;


MikeZdeb
Rhodochrosite | Level 12

Hi ... I had a similar idea ... why not just use SUMMARY then a data step.  One issue that arose is the MONTH is a literal rather than a number and a CLASS statement puts the months in alpha order rather than chronological order (you'd have to muck around with the data add month as 1, 2, 3 etc) ....

proc summary data=x;

class agency program month;

types agency*month program;

var active_cases pending_cases housing_subsidy_cases program_capacity;

output out=stats (drop=_:) sum=;

run;

data y;

set stats;

utilization=active_cases+pending_cases+(housing_subsidy_cases / 2) / program_capacity;

run;

agency     program     month         cases       cases       cases     capacity    utilization

           ProgramA                   810         174         84         1120        984.038

           ProgramB                   405          87         42          560        492.038

Agency1                April           26           7          4           40         33.050

Agency1                August          37           9          4           50         46.040

Agency1                December        44          10          4           60         54.033

Agency1                Febuary         27           6          4           40         33.050

Agency1                January         26           6          3           40         32.038

Agency1                July            38           5          4           50         43.040

Agency1                June            32          10          4           40         42.050

Agency1                March           28           8          4           40         36.050

Agency1                May             30           6          4           40         36.050

Agency1                November        42           8          3           60         50.025

Agency1                October         37           5          1           50         42.010

Agency1                September       38           7          3           50         45.030

Agency2                April           52          14          8           80         66.050

Agency2                August          74          18          8          100         92.040

Agency2                December        88          20          8          120        108.033

Agency2                Febuary         54          12          8           80         66.050

Agency2                January         52          12          6           80         64.038

Agency2                July            76          10          8          100         86.040

Agency2                June            64          20          8           80         84.050

Agency2                March           56          16          8           80         72.050

Agency2                May             60          12          8           80         72.050

Agency2                November        84          16          6          120        100.025

Agency2                October         74          10          2          100         84.010

Agency2                September       76          14          6          100         90.030

sas-innovate-2024.png

Don't miss out on SAS Innovate - Register now for the FREE Livestream!

Can't make it to Vegas? No problem! Watch our general sessions LIVE or on-demand starting April 17th. Hear from SAS execs, best-selling author Adam Grant, Hot Ones host Sean Evans, top tech journalist Kara Swisher, AI expert Cassie Kozyrkov, and the mind-blowing dance crew iLuminate! Plus, get access to over 20 breakout sessions.

 

Register now!

What is Bayesian Analysis?

Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 7 replies
  • 820 views
  • 0 likes
  • 6 in conversation