Re: Complex logic with Cumulative Sum using Hash Object

kashun · Posted 12-01-2020 07:55 PM

This might be a little complex to explain but would like to ask. I do have a dataset that uses conditions to generate extra fields using accumulated sum of those fields. I was hoping to use lags of the accumulated sum of the fields to create the result but not sure if there is a better approach.

HAVE:

data HAVE;
input @1 ID @5 Flag $2. @8 Date DATE9.  @18 Amount @24 C1 @27 C2 @30 C3 @33 PAY1_LIMIT @38 PAY2_DEC @42 PAY2_LIMIT @47 MAX_PAY ;
FORMAT DATE DATE9.;
datalines;
173 C1 01JAN2020 100   20 40 15 1000 0.1 1200 2000
173 M  03APR2020 300   20 40 15 1000 0.1 1200 2000
173 M  24APR2020 1300  20 40 15 1000 0.1 1200 2000
173 M  07JUN2020 500   20 40 15 1000 0.1 1200 2000
173 M  13JUN2020 -1300 20 40 15 1000 0.1 1200 2000
173 C1 28JUN2020 40    20 40 15 1000 0.1 1200 2000
173 M  07JUL2020 300   20 40 15 1000 0.1 1200 2000
173 C2 05AUG2020 50    20 40 15 1000 0.1 1200 2000
173 M  20AUG2020 1500  20 40 15 1000 0.1 1200 2000
173 C3 15SEP2020 30    20 40 15 1000 0.1 1200 2000
173 M  21OCT2020 -500  20 40 15 1000 0.1 1200 2000
173 C3 15NOV2020 30    20 40 15 1000 0.1 1200 2000
;
RUN;

Want

ID	Flag	Date	Amount	C1	C2	C3	PAY1_LIMIT	PAY2_DEC	PAY2_LIMIT	MAX_PAY	Pay_C	PAY1	PAY2	TOTAL_PAY	Accum_PAY_C	Accum_PAY1	Accum_PAY2	Accum_TOTAL_PAY
173	C1	01Jan2020	100	20	40	15	1000	0.1	1200	2000	20	0	0	20	20	0	0	20
173	M	03Apr2020	300	20	40	15	1000	0.1	1200	2000	0	300	0	300	20	300	0	320
173	M	24Apr2020	1300	20	40	15	1000	0.1	1200	2000	0	700	60	760	20	1000	60	1080
173	M	07Jun2020	500	20	40	15	1000	0.1	1200	2000	0	0	0	0	20	1000	60	1080
173	M	13Jun2020	-1300	20	40	15	1000	0.1	1200	2000	0	-700	-60	-760	20	300	0	320
173	C1	28Jun2020	40	20	40	15	1000	0.1	1200	2000	20	0	0	20	40	0	0	340
173	M	07Jul2020	300	20	40	15	1000	0.1	1200	2000	0	300	0	300	40	600	0	640
173	C2	05Aug2020	50	20	40	15	1000	0.1	1200	2000	40	0	0	400	80	600	0	640
173	M	20Aug2020	1500	20	40	15	1000	0.1	1200	2000	0	400	110	510	80	1000	110	1150
173	C3	15Sep2020	30	20	40	15	1000	0.1	1200	2000	15	0	0	15	95	1000	110	1165
173	C2	21Oct2020	-500	20	40	15	1000	0.1	1200	2000	-40	0	0	-40	50	1000	110	1150
173	C3	15Nov2020	30	20	40	15	1000	0.1	1200	2000	15	0	0	15	65	1000	100	1165

SASJedi · Posted 12-03-2020 10:17 AM

If you could explain the desired logic for the calculated fields, I might be able to make some suggestions. .

Check out my Jedi SAS Tricks for SAS Users

kashun · Posted 12-03-2020 10:33 AM

@SASJedi. Hope this will help.
1.The data depends on flags and limits to generate new fields as well as accumulated values.
2. Amount is displayed in Pay_C if flag is C1 - C3. Amount is displayed in Pay_1 if flag is M.
3. The data has the limit for C1 - C3.
Example, C1 - C3 are limits id 173 . i.e. if the row has Flag is C1 and Amount is 40, then id 173 will have Pay_C to be 20 since the maximum limit is 20 for C1.
If Amount was to be 15 then Pay_C would have been 15 Since it amount doesn't exceed the limit of C1 =20.
4. The data has the limit for PAY1 as well.
Example if PAY1 limit is 1000 and and amount is 300, then PAY1 will be 300. If the amount is 1200, then PAY1 will be 1000 with a remainder of 200 which will be used to calculate PAY2.
If cumulative sum of PAY1 exceeds the limit, the PAY1 will be zero. For instance row 2 from the data will have 300 under PAY1 since the flag is M. Since row 3 has amount 1300 and flag M, the remaining amount needed to meet the PAY1 limit will be 700 and hence PAY1 will be 700. The remaining 600 will be used to calculate PAY2 (600 * 0.1 =60. Note: use PAY2_DEC). Since PAY1's LIMIT is met, any row after that with flag M will have PAY1 to be zero, unless there is a negative (credit). Also Pay2 has a limit as well.
If there is a negative amount then the code should check all previous amount to see if there is an exact positive amount and negate its pay_c, pay1 and pay2.
Example, row 3 and 5 are just negative difference. If cumulative sum of total amount meets 2000 (max_pay) then any pay after should be zero unless a negative amount has been introduced to change the limit met.
The total_pay will be sum of pay_c , pay1 and pay2. Since calculations depend on accumulated

kashun · Posted 12-07-2020 11:01 AM

Hi @SASJedi, Any luck on this please?

kashun · Posted 12-11-2020 06:44 PM

I know we can create cumulative sum using hash objects. Is there store them and iterate them over per each line?

mkeintz · Posted 12-12-2020 02:35 PM

Since you have decided to keep cumulative sums is discrete variables (one for each set of flags), there is no advantage to using a hash object, versus retaining sums for each group.

But if you were to use the same variable for each of them, then you would have a better reason for hash. In such a case the hash object would be keyed on the flag values.

As a general concept using some of your data organization, here's an example of what I mean. Below pay_grp replaces payc and pay1, and pay_cum replaces accum_payc and accum_pay1. But there will be no ambiguity in pay_grp and pay_cum in the output records because they will be accompanied by a flag value. This works because the hash object if keyed on a new variable flg_grp.

I know that I didn't program in code to honor the limits. They idea here is to propose a revision of the output data layout that would be a more likely candidate for using a hash object:

data HAVE;
input @1 ID @5 Flag $2. @8 Date DATE9.  @18 Amount @24 C1 @27 C2 @30 C3 @33 PAY1_LIMIT @38 PAY2_DEC @42 PAY2_LIMIT @47 MAX_PAY ;
FORMAT DATE DATE9.;
datalines;
173 C1 01JAN2020 100   20 40 15 1000 0.1 1200 2000
173 M  03APR2020 300   20 40 15 1000 0.1 1200 2000
173 M  24APR2020 1300  20 40 15 1000 0.1 1200 2000
173 M  07JUN2020 500   20 40 15 1000 0.1 1200 2000
173 M  13JUN2020 -1300 20 40 15 1000 0.1 1200 2000
173 C1 28JUN2020 40    20 40 15 1000 0.1 1200 2000
173 M  07JUL2020 300   20 40 15 1000 0.1 1200 2000
173 C2 05AUG2020 50    20 40 15 1000 0.1 1200 2000
173 M  20AUG2020 1500  20 40 15 1000 0.1 1200 2000
173 C3 15SEP2020 30    20 40 15 1000 0.1 1200 2000
173 M  21OCT2020 -500  20 40 15 1000 0.1 1200 2000
173 C3 15NOV2020 30    20 40 15 1000 0.1 1200 2000
;
RUN;

data want;
  set have;

  length flg_group $1;
  flg_group=flag;  /* "C1", "C2", "C3" all become "C",  "M"==>"M"*/

  if _n_=1 then do;
    declare hash h ();
	  h.definekey('flg_group');
	  h.definedata('flg_group','flag','pay_grp','pay_cum');
	  h.definedone();
  end;

  if h.find()^=0 then pay_cum=0;  /*If not ready in hash h */

  select(flag);
    when ('c1') pay_grp=c1;
    when ('c2') pay_grp=c2;
    when ('c3') pay_grp=c3;
	when ('m')  pay_grp=pay1;
	otherwise;
  end;

  pay_cum=pay_cum+pay_grp;
  h.replace();
run;

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------

kashun · Posted 12-14-2020 10:17 AM

Thanks very much @mkeintz for the input. This gave me a different idea on applying the flags. Was thinking if there was another way to look at the cumulative sums with the limits.

Complex logic with Cumulative Sum using Hash Objects