Solved: Simultaneous Cumulative Sum of two variables with respect to date usi...

kashun · Posted 10-19-2020 03:01 PM

I have been struggling a little with how I can create a cumulative sum of two variable s using do until (last.ID). The data shown below is what I have.

The logic I would like to use is:

If (amount + Accum_test_ID2)  >= Check2 then amount = (Check2 - Accum_test_ID2)
else if (amount + Accum_test_ID1) >= Check1 then amount = (Check1 -Accum_test_ID2)
else amount = amount

Since the cumulative sum is dependent on ID1, ID2 and Date, I am finding it difficult to use sort by statement to create Accum_test_ID2 which for some reason messes up the cumulative sum for specific dates.

I tried using DO UNTIL(LAST.ID1); And in the do loop I grouped the data by ID1 and DATE and then retained Accum_test_D1 Accum_test_D2 0;

And also created cumulative sum: if first.ID then Accum_test_ID1 = test

else Accum_test_ID1+test.

Similarly if first.ID then Accum_test_ID2 = test else Accum_test_ID2+test.

With the above approach, I do get Accum_test_ID2 the same as Accum_test_ID1 since I grouped it by only D1 and Date.

I do want to know how to group it by both D1 and D2 without changing the date arrangement to correctly calculate the cumulative sums.

Note: test = sum(amount, small)

Accum_test_ID1 is the accumulative sum of test by ID1

Accum_TEST_ID2 is the accumulative sum of test by ID2

Data want;
test = 0;
do until (last.ID1);
set have;
by ID1 Date;
retain accum_test_ID1 accum_test_ID2  0;
if (amount +accum_test_ID2) > = Check2 then amount = (Check2 - accum_test_ID2);
else if (amount + accum_test_ID1) > = Check then amount = (Check1 - accum_test_ID2);
else amount =amount;
test = Sum(amount,Small);
if first.ID1 then accum_test_ID1 = test;
else accum_test_ID1+test;
if first.ID2 then accum_test_ID2 = test;
else accum_test_ID2 = test;
output;
end;
run;

Have

ID1	ID2	DATE	Amount	Check1	Check2	Small
1456	0	01JAN2020	1500	4000	2000	30
1456	1	03MAR2020	200	4000	2000	50
1456	1	04MAR2020	100	4000	2000	20
1456	0	09JUN2020	100	4000	2000	20
3314	0	04FEB2020	50	1200	600	40
3314	0	08FEB2020	250	1200	600	25
3314	1	10MAR2020	20	1200	600	48
3314	1	12APR2020	800	1200	600	70

Want

ID1	ID2	DATE	Amount	Check1	Check2	test	Accum_test_D1	Accum_test_D2
1456	0	01JAN2020	470	4000	2000	1530	1530	1530
1456	1	03MAR2020	200	4000	2000	250	1780	250
1456	1	04MAR2020	100	4000	2000	120	1900	370
1456	0	09JUM2020	100	4000	2000	120	2020	1650
3314	0	04FEB2020	50	1200	600	90	90	90
3314	0	08FEB2020	235	1200	600	275	365	365
3314	1	10MAR2020	20	1200	600	68	433	68
3314	1	12APR2020	262	1200	600	870	1303	938

mkeintz · Posted 10-21-2020 12:08 AM

OK, I get it now. If an ID2 value recurs within an ID1 group, you want the accumulator to bridge any gap between occurences of that ID2 value.

Thank you for setting up a real SAS data step. I've tested the program this time. It keeps a separate accumulator for ID2=0 and for ID2=1.

data have;

input @1 ID1 @6 ID2 @8 DATE DATE9. @18 Amount @24 small @27 limit1 @32 limit2;
FORMAT DATE date9.;
datalines;
1456 0 01JAN2020 1500  30 4000 2000 
1456 1 03MAR2020 200   50 4000 2000 
1456 1 04MAR2020 100   20 4000 2000
1456 0 09JUN2020 100   20 4000 2000
1456 0 12JUL2020 400   70 4000 2000
3314 0 04FEB2020 50    40 1200 600 
3314 0 08FEB2020 250   25 1200 600
3314 1 10MAR2020 20    48 1200 600
3314 1 12APR2020 800   70 1200 600
;

RUN;

data want (drop=_:);
  set have;
  by id1;
  test=sum(small,amount);
  accum_test_id1+test;
  if first.id1 then do;
    accum_test_id1=test;
    _id2_accum0=0;
    _id2_accum1=0;
  end;

  if id2=0 then _id2_accum0+test; else
  if id2=1 then _id2_accum1+test;

  accum_test_id2=ifn(id2=0,_id2_accum0,_id2_accum1);
run;

This program assumes 2 explicit values for ID2 - 0 or 1. Of course if you have more values, and if you know in advance what they might be, you can just generate an accumulator variable for each value. Of course, that gets annoying if you have more than a very few ID2 values.

But if you don't know the ID2's in advance, or you have a lot of distinct ID2's, then a hash object to store the accumulator keyed on each encountered ID2 value will be the right solution, as below. I didn't bother here to create the intermediate variable test, but that's not because I'm using a hash object:

data want ;
  set have;
  by id1;

  if _n_=1 then do;
    declare hash h ();
	 h.definekey('id2');
	 h.definedata('accum_test_id2');
	 h.definedone();
  end;

  accum_test_id1+amount+small;
  if first.id1 then accum_test_id1=sum(amount,small);

  if h.find()=0 then accum_test_id2+amount+small;
  else accum_test_id2=sum(amount,small);
  h.replace();

  if last.id1 then h.clear();
run;

Think of the hash object h as a lookup table, keyed on ID2 values. The lookup table keeps an updated value for accum_test_id2 for each of those ID2's. So when the lookup of the current value of ID2 is successful (i.e. h.find()=0), then that value is retrieved by the find method, and immediately added to amount+small. Otherwise the ID2 value is new, and the accumulator is directly calculated from amount and small. Then the updated accumulator value is put (back) into the hash lookup awaiting the next instance for the current ID2.

At the end of each ID1, the hash object is cleared to avoid corrupting the id2 accumulators for the next ID1 group.

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------

View solution in original post

ballardw · Posted 10-19-2020 03:11 PM

You can only use First. with variables that are on your BY statement. Since Id2 is NOT on the by statement you cannot use First.Id2.

If your data is sorted in a grouped order you could use: By Id1 id2 date notsorted; to use the groupings and get the first and last.

With a data step to provide data I'm not going to test anything at this time as I don't have time to write a data step.

kashun · Posted 10-20-2020 08:21 AM

Thanks very much @ballardw. I tried using by id1 id2 date notsorted but the accum_test_id2 doesn't continue with the accumulation. It begins with the test value on every first.
Eg. I get ID2 Accum_test_ID2
0 1530
1 250
1 370
0 120
0 210

mkeintz · Posted 10-20-2020 12:48 PM

In your sample WANT dataset, every value of TEST is simply the sum of amount+small, where amount has its original value.

Yet your program attempts to conditionally modify amount prior to the "test=sum(amount,small);" statement. Given the desired resulting value of test that you show, I would ordinarily conclude that you never expect amount to be changed, but your desired value of amount is changed.

I do not understand the rules you are trying to implement.

And, like @ballardw , I would appreciate it if you could provide a working data step generating the dataset HAVE from your sample data. I'd rather spend my time helping you to solve the problem than making a working sample dataset.

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------

mkeintz · Posted 10-20-2020 01:05 PM

Since TEST is effectively the sum of amount and small (until you provide data showing otherwise), I believe this program will generate WANT from HAVE in your sample. Whether it fits your rules as stated, I can't say. Test it on an actual sample data set:

data want;
  set have;
  by id1 id2 notsorted;
  test=sum(small,amount);
  accum_test_id1+test;
  if first.id1 then accum_test_id1=test;

  accum_test_id2+test;
  if first.id2 then accum_test_id2=test;
run;

And you probably don't need the intermediate variable test:

data want;
  set have;
  by id1 id2 notsorted;

  accum_test_id1+small+amount;
  if first.id1 then accum_test_id1=sum(small,amount);

  accum_test_id2+small+amount;
  if first.id2 then accum_test_id2=sum(small,amount);
run;

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------

kashun · Posted 10-20-2020 04:31 PM

Thanks @mkeintz. I think your code was close to what I wanted.. I used the data below:

data have;

input @1 ID1 @6 ID2 @8 DATE DATE9. @18 Amount @24 small @27 limit1 @32 limit2;
FORMAT DATE date9.;
datalines;
1456 0 01JAN2020 1500  30 4000 2000 
1456 1 03MAR2020 200   50 4000 2000 
1456 1 04MAR2020 100   20 4000 2000
1456 0 09JUN2020 100   20 4000 2000
1456 0 12JUL2020 400   70 4000 2000
3314 0 04FEB2020 50    40 1200 600 
3314 0 08FEB2020 250   25 1200 600
3314 1 10MAR2020 20    48 1200 600
3314 1 12APR2020 800   70 1200 600
;

RUN;

But get the result below when I use your program:

Get:

Want

mkeintz · Posted 10-21-2020 12:08 AM

OK, I get it now. If an ID2 value recurs within an ID1 group, you want the accumulator to bridge any gap between occurences of that ID2 value.

Thank you for setting up a real SAS data step. I've tested the program this time. It keeps a separate accumulator for ID2=0 and for ID2=1.

data have;

input @1 ID1 @6 ID2 @8 DATE DATE9. @18 Amount @24 small @27 limit1 @32 limit2;
FORMAT DATE date9.;
datalines;
1456 0 01JAN2020 1500  30 4000 2000 
1456 1 03MAR2020 200   50 4000 2000 
1456 1 04MAR2020 100   20 4000 2000
1456 0 09JUN2020 100   20 4000 2000
1456 0 12JUL2020 400   70 4000 2000
3314 0 04FEB2020 50    40 1200 600 
3314 0 08FEB2020 250   25 1200 600
3314 1 10MAR2020 20    48 1200 600
3314 1 12APR2020 800   70 1200 600
;

RUN;

data want (drop=_:);
  set have;
  by id1;
  test=sum(small,amount);
  accum_test_id1+test;
  if first.id1 then do;
    accum_test_id1=test;
    _id2_accum0=0;
    _id2_accum1=0;
  end;

  if id2=0 then _id2_accum0+test; else
  if id2=1 then _id2_accum1+test;

  accum_test_id2=ifn(id2=0,_id2_accum0,_id2_accum1);
run;

This program assumes 2 explicit values for ID2 - 0 or 1. Of course if you have more values, and if you know in advance what they might be, you can just generate an accumulator variable for each value. Of course, that gets annoying if you have more than a very few ID2 values.

But if you don't know the ID2's in advance, or you have a lot of distinct ID2's, then a hash object to store the accumulator keyed on each encountered ID2 value will be the right solution, as below. I didn't bother here to create the intermediate variable test, but that's not because I'm using a hash object:

data want ;
  set have;
  by id1;

  if _n_=1 then do;
    declare hash h ();
	 h.definekey('id2');
	 h.definedata('accum_test_id2');
	 h.definedone();
  end;

  accum_test_id1+amount+small;
  if first.id1 then accum_test_id1=sum(amount,small);

  if h.find()=0 then accum_test_id2+amount+small;
  else accum_test_id2=sum(amount,small);
  h.replace();

  if last.id1 then h.clear();
run;

Think of the hash object h as a lookup table, keyed on ID2 values. The lookup table keeps an updated value for accum_test_id2 for each of those ID2's. So when the lookup of the current value of ID2 is successful (i.e. h.find()=0), then that value is retrieved by the find method, and immediately added to amount+small. Otherwise the ID2 value is new, and the accumulator is directly calculated from amount and small. Then the updated accumulator value is put (back) into the hash lookup awaiting the next instance for the current ID2.

At the end of each ID1, the hash object is cleared to avoid corrupting the id2 accumulators for the next ID1 group.

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------

kashun · Posted 10-21-2020 12:30 PM

Thanks very much @mkeintz. This is very helpful. Really appreciate the help

Simultaneous Cumulative Sum of two variables with respect to date using Do Until loop

Re: Simultaneous Cumulative Sum of two variables with respect to date using Do Until loop

Re: Simultaneous Cumulative Sum of two variables with respect to date using Do Until loop

Re: Simultaneous Cumulative Sum of two variables with respect to date using Do Until loop

Re: Simultaneous Cumulative Sum of two variables with respect to date using Do Until loop

Re: Simultaneous Cumulative Sum of two variables with respect to date using Do Until loop

Re: Simultaneous Cumulative Sum of two variables with respect to date using Do Until loop

Re: Simultaneous Cumulative Sum of two variables with respect to date using Do Until loop

Re: Simultaneous Cumulative Sum of two variables with respect to date using Do Until loop

SAS Innovate 2025: Save the Date

SAS Training: Just a Click Away