BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
kashun
Obsidian | Level 7

I have been struggling a little with how I can create a cumulative sum of two variable s using do until (last.ID). The data shown below is what I have.

The logic I would like to use is:

 

If (amount + Accum_test_ID2)  >= Check2 then amount = (Check2 - Accum_test_ID2)
else if (amount + Accum_test_ID1) >= Check1 then amount = (Check1 -Accum_test_ID2)
else amount = amount

 

Since the cumulative sum is dependent on ID1, ID2 and Date, I am finding it difficult to use sort by statement to create Accum_test_ID2 which for some reason messes up the cumulative sum for specific dates.

I tried using DO UNTIL(LAST.ID1); And in the do loop I grouped the data by ID1 and DATE and then retained Accum_test_D1 Accum_test_D2 0;

And also created cumulative sum: if first.ID then Accum_test_ID1 = test

else Accum_test_ID1+test.

Similarly if first.ID then Accum_test_ID2 = test else Accum_test_ID2+test.

With the above approach, I do get Accum_test_ID2 the same as Accum_test_ID1 since I grouped it by only D1 and Date.

I do want to know how to group it by both D1 and D2 without changing the date arrangement to correctly calculate the cumulative sums.

Note: test = sum(amount, small)

Accum_test_ID1 is the accumulative sum of test by ID1

Accum_TEST_ID2 is the accumulative sum of test by ID2

 

Data want;
test = 0;
do until (last.ID1);
set have;
by ID1 Date;
retain accum_test_ID1 accum_test_ID2  0;
if (amount +accum_test_ID2) > = Check2 then amount = (Check2 - accum_test_ID2);
else if (amount + accum_test_ID1) > = Check then amount = (Check1 - accum_test_ID2);
else amount =amount;
test = Sum(amount,Small);
if first.ID1 then accum_test_ID1 = test;
else accum_test_ID1+test;
if first.ID2 then accum_test_ID2 = test;
else accum_test_ID2 = test;
output;
end;
run;

 

Have

ID1

ID2

DATE

Amount

Check1

Check2

Small

1456

0

01JAN2020

1500

4000

2000

30

1456

1

03MAR2020

200

4000

2000

50

1456

1

04MAR2020

100

4000

2000

20

1456

0

09JUN2020

100

4000

2000

20

3314

0

04FEB2020

50

1200

600

40

3314

0

08FEB2020

250

1200

600

25

3314

1

10MAR2020

20

1200

600

48

3314

1

12APR2020

800

1200

600

70

 

Want

ID1

ID2

DATE

Amount

Check1

Check2

test

Accum_test_D1

Accum_test_D2

1456

0

01JAN2020

470

4000

2000

1530

1530

1530

1456

1

03MAR2020

200

4000

2000

250

1780

250

1456

1

04MAR2020

100

4000

2000

120

1900

370

1456

0

09JUM2020

100

4000

2000

120

2020

1650

3314

0

04FEB2020

50

1200

600

90

90

90

3314

0

08FEB2020

235

1200

600

275

365

365

3314

1

10MAR2020

20

1200

600

68

433

68

3314

1

12APR2020

262

1200

600

870

1303

938

 

 

1 ACCEPTED SOLUTION

Accepted Solutions
mkeintz
PROC Star

OK, I get it now.  If an ID2 value recurs within an ID1 group, you want the accumulator to bridge any gap between occurences of that ID2 value.

 

Thank you for setting up a real SAS data step.  I've tested the program this time.  It keeps a separate accumulator for ID2=0 and for ID2=1.

 

data have;

input @1 ID1 @6 ID2 @8 DATE DATE9. @18 Amount @24 small @27 limit1 @32 limit2;
FORMAT DATE date9.;
datalines;
1456 0 01JAN2020 1500  30 4000 2000 
1456 1 03MAR2020 200   50 4000 2000 
1456 1 04MAR2020 100   20 4000 2000
1456 0 09JUN2020 100   20 4000 2000
1456 0 12JUL2020 400   70 4000 2000
3314 0 04FEB2020 50    40 1200 600 
3314 0 08FEB2020 250   25 1200 600
3314 1 10MAR2020 20    48 1200 600
3314 1 12APR2020 800   70 1200 600
;

RUN;

data want (drop=_:);
  set have;
  by id1;
  test=sum(small,amount);
  accum_test_id1+test;
  if first.id1 then do;
    accum_test_id1=test;
    _id2_accum0=0;
    _id2_accum1=0;
  end;

  if id2=0 then _id2_accum0+test; else
  if id2=1 then _id2_accum1+test;

  accum_test_id2=ifn(id2=0,_id2_accum0,_id2_accum1);
run;

This program assumes 2 explicit values for ID2 - 0 or 1.  Of course if you have more values, and if you know in advance what they might be, you can just generate an accumulator variable for each value.  Of course, that gets annoying if you have more than a very few ID2 values.

But if you don't know the ID2's in advance, or you have a lot of distinct ID2's, then a hash object to store the accumulator keyed on each encountered ID2 value will be the right solution, as below.  I didn't bother here to create the intermediate variable test, but that's not because I'm using a hash object:

data want ;
  set have;
  by id1;

  if _n_=1 then do;
    declare hash h ();
	 h.definekey('id2');
	 h.definedata('accum_test_id2');
	 h.definedone();
  end;

  accum_test_id1+amount+small;
  if first.id1 then accum_test_id1=sum(amount,small);

  if h.find()=0 then accum_test_id2+amount+small;
  else accum_test_id2=sum(amount,small);
  h.replace();

  if last.id1 then h.clear();
run;

Think of the hash object h as a lookup table, keyed on ID2 values.  The lookup table keeps an updated value for accum_test_id2 for each of those ID2's.  So when the lookup of the current value of ID2 is successful (i.e.  h.find()=0), then that value is retrieved by the find method, and immediately added to amount+small.  Otherwise the ID2 value is new, and the accumulator is directly calculated from amount and small.  Then the updated accumulator value is put (back) into the hash lookup awaiting the next instance for the current ID2.

 

At the end of each ID1, the hash object is cleared to avoid corrupting the id2 accumulators for the next ID1 group.

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------

View solution in original post

7 REPLIES 7
ballardw
Super User

You can only use First. with variables that are on your BY statement. Since Id2 is NOT on the by statement you cannot use First.Id2.

 

If your data is sorted in a grouped order you could use: By Id1 id2 date notsorted; to use the groupings and get the first and last.

 

With a data step to provide data I'm not going to test anything at this time as I don't have time to write a data step.

 

 

kashun
Obsidian | Level 7
Thanks very much @ballardw. I tried using by id1 id2 date notsorted but the accum_test_id2 doesn't continue with the accumulation. It begins with the test value on every first.
Eg. I get ID2 Accum_test_ID2
0 1530
1 250
1 370
0 120
0 210



mkeintz
PROC Star

In your sample WANT dataset, every value of TEST is simply the sum of amount+small, where amount has its original value.

 

Yet your program attempts to conditionally modify amount prior to the "test=sum(amount,small);" statement.  Given the desired resulting value of test that you show, I would ordinarily conclude that you never expect amount to be changed, but your desired value of amount is changed.

 

I do not understand the rules you are trying to implement.

 

And, like @ballardw , I would appreciate it if you could provide a working data step generating the dataset HAVE from your sample data.  I'd rather spend my time helping you to solve the problem than making a working sample dataset.

 

 

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------
mkeintz
PROC Star

Since TEST is effectively the sum of amount and small (until you provide data showing otherwise), I believe this program will generate WANT from HAVE in your sample.  Whether it fits your rules as stated, I can't say.  Test it on an actual sample data set:

 

data want;
  set have;
  by id1 id2 notsorted;
  test=sum(small,amount);
  accum_test_id1+test;
  if first.id1 then accum_test_id1=test;

  accum_test_id2+test;
  if first.id2 then accum_test_id2=test;
run;

And you probably don't need the intermediate variable test:

 

data want;
  set have;
  by id1 id2 notsorted;

  accum_test_id1+small+amount;
  if first.id1 then accum_test_id1=sum(small,amount);

  accum_test_id2+small+amount;
  if first.id2 then accum_test_id2=sum(small,amount);
run;

 

 

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------
kashun
Obsidian | Level 7

Thanks @mkeintz. I think your code was close to what I wanted.. I used the data below:

data have;

input @1 ID1 @6 ID2 @8 DATE DATE9. @18 Amount @24 small @27 limit1 @32 limit2;
FORMAT DATE date9.;
datalines;
1456 0 01JAN2020 1500  30 4000 2000 
1456 1 03MAR2020 200   50 4000 2000 
1456 1 04MAR2020 100   20 4000 2000
1456 0 09JUN2020 100   20 4000 2000
1456 0 12JUL2020 400   70 4000 2000
3314 0 04FEB2020 50    40 1200 600 
3314 0 08FEB2020 250   25 1200 600
3314 1 10MAR2020 20    48 1200 600
3314 1 12APR2020 800   70 1200 600
;

RUN;

But get the result below when I use your program:

 

Get:

Get.png

 
 

Want

want.png

 

 

 

mkeintz
PROC Star

OK, I get it now.  If an ID2 value recurs within an ID1 group, you want the accumulator to bridge any gap between occurences of that ID2 value.

 

Thank you for setting up a real SAS data step.  I've tested the program this time.  It keeps a separate accumulator for ID2=0 and for ID2=1.

 

data have;

input @1 ID1 @6 ID2 @8 DATE DATE9. @18 Amount @24 small @27 limit1 @32 limit2;
FORMAT DATE date9.;
datalines;
1456 0 01JAN2020 1500  30 4000 2000 
1456 1 03MAR2020 200   50 4000 2000 
1456 1 04MAR2020 100   20 4000 2000
1456 0 09JUN2020 100   20 4000 2000
1456 0 12JUL2020 400   70 4000 2000
3314 0 04FEB2020 50    40 1200 600 
3314 0 08FEB2020 250   25 1200 600
3314 1 10MAR2020 20    48 1200 600
3314 1 12APR2020 800   70 1200 600
;

RUN;

data want (drop=_:);
  set have;
  by id1;
  test=sum(small,amount);
  accum_test_id1+test;
  if first.id1 then do;
    accum_test_id1=test;
    _id2_accum0=0;
    _id2_accum1=0;
  end;

  if id2=0 then _id2_accum0+test; else
  if id2=1 then _id2_accum1+test;

  accum_test_id2=ifn(id2=0,_id2_accum0,_id2_accum1);
run;

This program assumes 2 explicit values for ID2 - 0 or 1.  Of course if you have more values, and if you know in advance what they might be, you can just generate an accumulator variable for each value.  Of course, that gets annoying if you have more than a very few ID2 values.

But if you don't know the ID2's in advance, or you have a lot of distinct ID2's, then a hash object to store the accumulator keyed on each encountered ID2 value will be the right solution, as below.  I didn't bother here to create the intermediate variable test, but that's not because I'm using a hash object:

data want ;
  set have;
  by id1;

  if _n_=1 then do;
    declare hash h ();
	 h.definekey('id2');
	 h.definedata('accum_test_id2');
	 h.definedone();
  end;

  accum_test_id1+amount+small;
  if first.id1 then accum_test_id1=sum(amount,small);

  if h.find()=0 then accum_test_id2+amount+small;
  else accum_test_id2=sum(amount,small);
  h.replace();

  if last.id1 then h.clear();
run;

Think of the hash object h as a lookup table, keyed on ID2 values.  The lookup table keeps an updated value for accum_test_id2 for each of those ID2's.  So when the lookup of the current value of ID2 is successful (i.e.  h.find()=0), then that value is retrieved by the find method, and immediately added to amount+small.  Otherwise the ID2 value is new, and the accumulator is directly calculated from amount and small.  Then the updated accumulator value is put (back) into the hash lookup awaiting the next instance for the current ID2.

 

At the end of each ID1, the hash object is cleared to avoid corrupting the id2 accumulators for the next ID1 group.

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------
kashun
Obsidian | Level 7

Thanks very much @mkeintz. This is very helpful. Really appreciate the help

SAS Innovate 2025: Save the Date

 SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!

Save the date!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 7 replies
  • 1519 views
  • 3 likes
  • 3 in conversation