BookmarkSubscribeRSS Feed
kashun
Obsidian | Level 7

This might be a little complex to explain but would like to ask. I do have a dataset that uses conditions to generate extra fields using accumulated sum of those fields. I was hoping to use lags of the accumulated sum  of the fields to create the result but not sure if there is a better approach.

 

HAVE:

data HAVE;
input @1 ID @5 Flag $2. @8 Date DATE9.  @18 Amount @24 C1 @27 C2 @30 C3 @33 PAY1_LIMIT @38 PAY2_DEC @42 PAY2_LIMIT @47 MAX_PAY ;
FORMAT DATE DATE9.;
datalines;
173 C1 01JAN2020 100   20 40 15 1000 0.1 1200 2000
173 M  03APR2020 300   20 40 15 1000 0.1 1200 2000
173 M  24APR2020 1300  20 40 15 1000 0.1 1200 2000
173 M  07JUN2020 500   20 40 15 1000 0.1 1200 2000
173 M  13JUN2020 -1300 20 40 15 1000 0.1 1200 2000
173 C1 28JUN2020 40    20 40 15 1000 0.1 1200 2000
173 M  07JUL2020 300   20 40 15 1000 0.1 1200 2000
173 C2 05AUG2020 50    20 40 15 1000 0.1 1200 2000
173 M  20AUG2020 1500  20 40 15 1000 0.1 1200 2000
173 C3 15SEP2020 30    20 40 15 1000 0.1 1200 2000
173 M  21OCT2020 -500  20 40 15 1000 0.1 1200 2000
173 C3 15NOV2020 30    20 40 15 1000 0.1 1200 2000
;
RUN;

Want

ID Flag Date Amount C1 C2 C3 PAY1_LIMIT PAY2_DEC PAY2_LIMIT MAX_PAY Pay_C PAY1 PAY2 TOTAL_PAY Accum_PAY_C Accum_PAY1 Accum_PAY2 Accum_TOTAL_PAY
173 C1 01Jan2020 100 20 40 15 1000 0.1 1200 2000 20 0 0 20 20 0 0 20
173 M 03Apr2020 300 20 40 15 1000 0.1 1200 2000 0 300 0 300 20 300 0 320
173 M 24Apr2020 1300 20 40 15 1000 0.1 1200 2000 0 700 60 760 20 1000 60 1080
173 M 07Jun2020 500 20 40 15 1000 0.1 1200 2000 0 0 0 0 20 1000 60 1080
173 M 13Jun2020 -1300 20 40 15 1000 0.1 1200 2000 0 -700 -60 -760 20 300 0 320
173 C1 28Jun2020 40 20 40 15 1000 0.1 1200 2000 20 0 0 20 40 0 0 340
173 M 07Jul2020 300 20 40 15 1000 0.1 1200 2000 0 300 0 300 40 600 0 640
173 C2 05Aug2020 50 20 40 15 1000 0.1 1200 2000 40 0 0 400 80 600 0 640
173 M 20Aug2020 1500 20 40 15 1000 0.1 1200 2000 0 400 110 510 80 1000 110 1150
173 C3 15Sep2020 30 20 40 15 1000 0.1 1200 2000 15 0 0 15 95 1000 110 1165
173 C2 21Oct2020 -500 20 40 15 1000 0.1 1200 2000 -40 0 0 -40 50 1000 110 1150
173 C3 15Nov2020 30 20 40 15 1000 0.1 1200 2000 15 0 0 15 65 1000 100 1165
 

 

 

6 REPLIES 6
SASJedi
Ammonite | Level 13

If you could explain the desired logic for the calculated fields, I might be able to make some suggestions. . 

Check out my Jedi SAS Tricks for SAS Users
kashun
Obsidian | Level 7
@SASJedi. Hope this will help.
1.The data depends on flags and limits to generate new fields as well as accumulated values.
2. Amount is displayed in Pay_C if flag is C1 - C3. Amount is displayed in Pay_1 if flag is M.
3. The data has the limit for C1 - C3.
Example, C1 - C3 are limits id 173 . i.e. if the row has Flag is C1 and Amount is 40, then id 173 will have Pay_C to be 20 since the maximum limit is 20 for C1.
If Amount was to be 15 then Pay_C would have been 15 Since it amount doesn't exceed the limit of C1 =20.
4. The data has the limit for PAY1 as well.
Example if PAY1 limit is 1000 and and amount is 300, then PAY1 will be 300. If the amount is 1200, then PAY1 will be 1000 with a remainder of 200 which will be used to calculate PAY2.
If cumulative sum of PAY1 exceeds the limit, the PAY1 will be zero. For instance row 2 from the data will have 300 under PAY1 since the flag is M. Since row 3 has amount 1300 and flag M, the remaining amount needed to meet the PAY1 limit will be 700 and hence PAY1 will be 700. The remaining 600 will be used to calculate PAY2 (600 * 0.1 =60. Note: use PAY2_DEC). Since PAY1's LIMIT is met, any row after that with flag M will have PAY1 to be zero, unless there is a negative (credit). Also Pay2 has a limit as well.
If there is a negative amount then the code should check all previous amount to see if there is an exact positive amount and negate its pay_c, pay1 and pay2.
Example, row 3 and 5 are just negative difference. If cumulative sum of total amount meets 2000 (max_pay) then any pay after should be zero unless a negative amount has been introduced to change the limit met.
The total_pay will be sum of pay_c , pay1 and pay2. Since calculations depend on accumulated
kashun
Obsidian | Level 7
Hi @SASJedi, Any luck on this please?
kashun
Obsidian | Level 7

I know we can create cumulative sum using hash objects. Is there store them and iterate them over per each line?

mkeintz
PROC Star

Since you have decided to keep cumulative sums is discrete variables (one for each set of flags), there is no advantage to using a hash object, versus retaining sums for each group.

 

But if you were to use the same variable for each of them, then you would have a better reason for hash.  In such a case the hash object would be keyed on the flag values.  

 

As a general concept using some of your data organization, here's an example of what I mean.  Below pay_grp replaces payc and pay1, and pay_cum replaces accum_payc and accum_pay1.  But there will be no ambiguity in pay_grp and pay_cum in the output records because they will be accompanied by a flag value.  This works because the hash object if keyed on a new variable flg_grp.

 

I know that I didn't program in code to honor the limits.  They idea here is to propose a revision of the output data layout that would be a more likely candidate for using a hash object:

 

data HAVE;
input @1 ID @5 Flag $2. @8 Date DATE9.  @18 Amount @24 C1 @27 C2 @30 C3 @33 PAY1_LIMIT @38 PAY2_DEC @42 PAY2_LIMIT @47 MAX_PAY ;
FORMAT DATE DATE9.;
datalines;
173 C1 01JAN2020 100   20 40 15 1000 0.1 1200 2000
173 M  03APR2020 300   20 40 15 1000 0.1 1200 2000
173 M  24APR2020 1300  20 40 15 1000 0.1 1200 2000
173 M  07JUN2020 500   20 40 15 1000 0.1 1200 2000
173 M  13JUN2020 -1300 20 40 15 1000 0.1 1200 2000
173 C1 28JUN2020 40    20 40 15 1000 0.1 1200 2000
173 M  07JUL2020 300   20 40 15 1000 0.1 1200 2000
173 C2 05AUG2020 50    20 40 15 1000 0.1 1200 2000
173 M  20AUG2020 1500  20 40 15 1000 0.1 1200 2000
173 C3 15SEP2020 30    20 40 15 1000 0.1 1200 2000
173 M  21OCT2020 -500  20 40 15 1000 0.1 1200 2000
173 C3 15NOV2020 30    20 40 15 1000 0.1 1200 2000
;
RUN;

data want;
  set have;

  length flg_group $1;
  flg_group=flag;  /* "C1", "C2", "C3" all become "C",  "M"==>"M"*/

  if _n_=1 then do;
    declare hash h ();
	  h.definekey('flg_group');
	  h.definedata('flg_group','flag','pay_grp','pay_cum');
	  h.definedone();
  end;

  if h.find()^=0 then pay_cum=0;  /*If not ready in hash h */

  select(flag);
    when ('c1') pay_grp=c1;
    when ('c2') pay_grp=c2;
    when ('c3') pay_grp=c3;
	when ('m')  pay_grp=pay1;
	otherwise;
  end;

  pay_cum=pay_cum+pay_grp;
  h.replace();
run;

 

 

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------
kashun
Obsidian | Level 7
Thanks very much @mkeintz for the input. This gave me a different idea on applying the flags. Was thinking if there was another way to look at the cumulative sums with the limits.

hackathon24-white-horiz.png

The 2025 SAS Hackathon has begun!

It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.

Latest Updates

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 6 replies
  • 1967 views
  • 1 like
  • 3 in conversation