I have been struggling a little with how I can create a cumulative sum of two variable s using do until (last.ID). The data shown below is what I have.
The logic I would like to use is:
If (amount + Accum_test_ID2) >= Check2 then amount = (Check2 - Accum_test_ID2)
else if (amount + Accum_test_ID1) >= Check1 then amount = (Check1 -Accum_test_ID2)
else amount = amount
Since the cumulative sum is dependent on ID1, ID2 and Date, I am finding it difficult to use sort by statement to create Accum_test_ID2 which for some reason messes up the cumulative sum for specific dates.
I tried using DO UNTIL(LAST.ID1); And in the do loop I grouped the data by ID1 and DATE and then retained Accum_test_D1 Accum_test_D2 0;
And also created cumulative sum: if first.ID then Accum_test_ID1 = test
else Accum_test_ID1+test.
Similarly if first.ID then Accum_test_ID2 = test else Accum_test_ID2+test.
With the above approach, I do get Accum_test_ID2 the same as Accum_test_ID1 since I grouped it by only D1 and Date.
I do want to know how to group it by both D1 and D2 without changing the date arrangement to correctly calculate the cumulative sums.
Note: test = sum(amount, small)
Accum_test_ID1 is the accumulative sum of test by ID1
Accum_TEST_ID2 is the accumulative sum of test by ID2
Data want;
test = 0;
do until (last.ID1);
set have;
by ID1 Date;
retain accum_test_ID1 accum_test_ID2 0;
if (amount +accum_test_ID2) > = Check2 then amount = (Check2 - accum_test_ID2);
else if (amount + accum_test_ID1) > = Check then amount = (Check1 - accum_test_ID2);
else amount =amount;
test = Sum(amount,Small);
if first.ID1 then accum_test_ID1 = test;
else accum_test_ID1+test;
if first.ID2 then accum_test_ID2 = test;
else accum_test_ID2 = test;
output;
end;
run;
Have
ID1 | ID2 | DATE | Amount | Check1 | Check2 | Small |
1456 | 0 | 01JAN2020 | 1500 | 4000 | 2000 | 30 |
1456 | 1 | 03MAR2020 | 200 | 4000 | 2000 | 50 |
1456 | 1 | 04MAR2020 | 100 | 4000 | 2000 | 20 |
1456 | 0 | 09JUN2020 | 100 | 4000 | 2000 | 20 |
3314 | 0 | 04FEB2020 | 50 | 1200 | 600 | 40 |
3314 | 0 | 08FEB2020 | 250 | 1200 | 600 | 25 |
3314 | 1 | 10MAR2020 | 20 | 1200 | 600 | 48 |
3314 | 1 | 12APR2020 | 800 | 1200 | 600 | 70 |
Want
ID1 | ID2 | DATE | Amount | Check1 | Check2 | test | Accum_test_D1 | Accum_test_D2 |
1456 | 0 | 01JAN2020 | 470 | 4000 | 2000 | 1530 | 1530 | 1530 |
1456 | 1 | 03MAR2020 | 200 | 4000 | 2000 | 250 | 1780 | 250 |
1456 | 1 | 04MAR2020 | 100 | 4000 | 2000 | 120 | 1900 | 370 |
1456 | 0 | 09JUM2020 | 100 | 4000 | 2000 | 120 | 2020 | 1650 |
3314 | 0 | 04FEB2020 | 50 | 1200 | 600 | 90 | 90 | 90 |
3314 | 0 | 08FEB2020 | 235 | 1200 | 600 | 275 | 365 | 365 |
3314 | 1 | 10MAR2020 | 20 | 1200 | 600 | 68 | 433 | 68 |
3314 | 1 | 12APR2020 | 262 | 1200 | 600 | 870 | 1303 | 938 |
OK, I get it now. If an ID2 value recurs within an ID1 group, you want the accumulator to bridge any gap between occurences of that ID2 value.
Thank you for setting up a real SAS data step. I've tested the program this time. It keeps a separate accumulator for ID2=0 and for ID2=1.
data have;
input @1 ID1 @6 ID2 @8 DATE DATE9. @18 Amount @24 small @27 limit1 @32 limit2;
FORMAT DATE date9.;
datalines;
1456 0 01JAN2020 1500 30 4000 2000
1456 1 03MAR2020 200 50 4000 2000
1456 1 04MAR2020 100 20 4000 2000
1456 0 09JUN2020 100 20 4000 2000
1456 0 12JUL2020 400 70 4000 2000
3314 0 04FEB2020 50 40 1200 600
3314 0 08FEB2020 250 25 1200 600
3314 1 10MAR2020 20 48 1200 600
3314 1 12APR2020 800 70 1200 600
;
RUN;
data want (drop=_:);
set have;
by id1;
test=sum(small,amount);
accum_test_id1+test;
if first.id1 then do;
accum_test_id1=test;
_id2_accum0=0;
_id2_accum1=0;
end;
if id2=0 then _id2_accum0+test; else
if id2=1 then _id2_accum1+test;
accum_test_id2=ifn(id2=0,_id2_accum0,_id2_accum1);
run;
This program assumes 2 explicit values for ID2 - 0 or 1. Of course if you have more values, and if you know in advance what they might be, you can just generate an accumulator variable for each value. Of course, that gets annoying if you have more than a very few ID2 values.
But if you don't know the ID2's in advance, or you have a lot of distinct ID2's, then a hash object to store the accumulator keyed on each encountered ID2 value will be the right solution, as below. I didn't bother here to create the intermediate variable test, but that's not because I'm using a hash object:
data want ;
set have;
by id1;
if _n_=1 then do;
declare hash h ();
h.definekey('id2');
h.definedata('accum_test_id2');
h.definedone();
end;
accum_test_id1+amount+small;
if first.id1 then accum_test_id1=sum(amount,small);
if h.find()=0 then accum_test_id2+amount+small;
else accum_test_id2=sum(amount,small);
h.replace();
if last.id1 then h.clear();
run;
Think of the hash object h as a lookup table, keyed on ID2 values. The lookup table keeps an updated value for accum_test_id2 for each of those ID2's. So when the lookup of the current value of ID2 is successful (i.e. h.find()=0), then that value is retrieved by the find method, and immediately added to amount+small. Otherwise the ID2 value is new, and the accumulator is directly calculated from amount and small. Then the updated accumulator value is put (back) into the hash lookup awaiting the next instance for the current ID2.
At the end of each ID1, the hash object is cleared to avoid corrupting the id2 accumulators for the next ID1 group.
You can only use First. with variables that are on your BY statement. Since Id2 is NOT on the by statement you cannot use First.Id2.
If your data is sorted in a grouped order you could use: By Id1 id2 date notsorted; to use the groupings and get the first and last.
With a data step to provide data I'm not going to test anything at this time as I don't have time to write a data step.
In your sample WANT dataset, every value of TEST is simply the sum of amount+small, where amount has its original value.
Yet your program attempts to conditionally modify amount prior to the "test=sum(amount,small);" statement. Given the desired resulting value of test that you show, I would ordinarily conclude that you never expect amount to be changed, but your desired value of amount is changed.
I do not understand the rules you are trying to implement.
And, like @ballardw , I would appreciate it if you could provide a working data step generating the dataset HAVE from your sample data. I'd rather spend my time helping you to solve the problem than making a working sample dataset.
Since TEST is effectively the sum of amount and small (until you provide data showing otherwise), I believe this program will generate WANT from HAVE in your sample. Whether it fits your rules as stated, I can't say. Test it on an actual sample data set:
data want;
set have;
by id1 id2 notsorted;
test=sum(small,amount);
accum_test_id1+test;
if first.id1 then accum_test_id1=test;
accum_test_id2+test;
if first.id2 then accum_test_id2=test;
run;
And you probably don't need the intermediate variable test:
data want;
set have;
by id1 id2 notsorted;
accum_test_id1+small+amount;
if first.id1 then accum_test_id1=sum(small,amount);
accum_test_id2+small+amount;
if first.id2 then accum_test_id2=sum(small,amount);
run;
Thanks @mkeintz. I think your code was close to what I wanted.. I used the data below:
data have;
input @1 ID1 @6 ID2 @8 DATE DATE9. @18 Amount @24 small @27 limit1 @32 limit2;
FORMAT DATE date9.;
datalines;
1456 0 01JAN2020 1500 30 4000 2000
1456 1 03MAR2020 200 50 4000 2000
1456 1 04MAR2020 100 20 4000 2000
1456 0 09JUN2020 100 20 4000 2000
1456 0 12JUL2020 400 70 4000 2000
3314 0 04FEB2020 50 40 1200 600
3314 0 08FEB2020 250 25 1200 600
3314 1 10MAR2020 20 48 1200 600
3314 1 12APR2020 800 70 1200 600
;
RUN;
But get the result below when I use your program:
Get:
Want
OK, I get it now. If an ID2 value recurs within an ID1 group, you want the accumulator to bridge any gap between occurences of that ID2 value.
Thank you for setting up a real SAS data step. I've tested the program this time. It keeps a separate accumulator for ID2=0 and for ID2=1.
data have;
input @1 ID1 @6 ID2 @8 DATE DATE9. @18 Amount @24 small @27 limit1 @32 limit2;
FORMAT DATE date9.;
datalines;
1456 0 01JAN2020 1500 30 4000 2000
1456 1 03MAR2020 200 50 4000 2000
1456 1 04MAR2020 100 20 4000 2000
1456 0 09JUN2020 100 20 4000 2000
1456 0 12JUL2020 400 70 4000 2000
3314 0 04FEB2020 50 40 1200 600
3314 0 08FEB2020 250 25 1200 600
3314 1 10MAR2020 20 48 1200 600
3314 1 12APR2020 800 70 1200 600
;
RUN;
data want (drop=_:);
set have;
by id1;
test=sum(small,amount);
accum_test_id1+test;
if first.id1 then do;
accum_test_id1=test;
_id2_accum0=0;
_id2_accum1=0;
end;
if id2=0 then _id2_accum0+test; else
if id2=1 then _id2_accum1+test;
accum_test_id2=ifn(id2=0,_id2_accum0,_id2_accum1);
run;
This program assumes 2 explicit values for ID2 - 0 or 1. Of course if you have more values, and if you know in advance what they might be, you can just generate an accumulator variable for each value. Of course, that gets annoying if you have more than a very few ID2 values.
But if you don't know the ID2's in advance, or you have a lot of distinct ID2's, then a hash object to store the accumulator keyed on each encountered ID2 value will be the right solution, as below. I didn't bother here to create the intermediate variable test, but that's not because I'm using a hash object:
data want ;
set have;
by id1;
if _n_=1 then do;
declare hash h ();
h.definekey('id2');
h.definedata('accum_test_id2');
h.definedone();
end;
accum_test_id1+amount+small;
if first.id1 then accum_test_id1=sum(amount,small);
if h.find()=0 then accum_test_id2+amount+small;
else accum_test_id2=sum(amount,small);
h.replace();
if last.id1 then h.clear();
run;
Think of the hash object h as a lookup table, keyed on ID2 values. The lookup table keeps an updated value for accum_test_id2 for each of those ID2's. So when the lookup of the current value of ID2 is successful (i.e. h.find()=0), then that value is retrieved by the find method, and immediately added to amount+small. Otherwise the ID2 value is new, and the accumulator is directly calculated from amount and small. Then the updated accumulator value is put (back) into the hash lookup awaiting the next instance for the current ID2.
At the end of each ID1, the hash object is cleared to avoid corrupting the id2 accumulators for the next ID1 group.
Thanks very much @mkeintz. This is very helpful. Really appreciate the help
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.