This might be a little complex to explain but would like to ask. I do have a dataset that uses conditions to generate extra fields using accumulated sum of those fields. I was hoping to use lags of the accumulated sum of the fields to create the result but not sure if there is a better approach.
HAVE:
data HAVE;
input @1 ID @5 Flag $2. @8 Date DATE9. @18 Amount @24 C1 @27 C2 @30 C3 @33 PAY1_LIMIT @38 PAY2_DEC @42 PAY2_LIMIT @47 MAX_PAY ;
FORMAT DATE DATE9.;
datalines;
173 C1 01JAN2020 100 20 40 15 1000 0.1 1200 2000
173 M 03APR2020 300 20 40 15 1000 0.1 1200 2000
173 M 24APR2020 1300 20 40 15 1000 0.1 1200 2000
173 M 07JUN2020 500 20 40 15 1000 0.1 1200 2000
173 M 13JUN2020 -1300 20 40 15 1000 0.1 1200 2000
173 C1 28JUN2020 40 20 40 15 1000 0.1 1200 2000
173 M 07JUL2020 300 20 40 15 1000 0.1 1200 2000
173 C2 05AUG2020 50 20 40 15 1000 0.1 1200 2000
173 M 20AUG2020 1500 20 40 15 1000 0.1 1200 2000
173 C3 15SEP2020 30 20 40 15 1000 0.1 1200 2000
173 M 21OCT2020 -500 20 40 15 1000 0.1 1200 2000
173 C3 15NOV2020 30 20 40 15 1000 0.1 1200 2000
;
RUN;
Want
ID | Flag | Date | Amount | C1 | C2 | C3 | PAY1_LIMIT | PAY2_DEC | PAY2_LIMIT | MAX_PAY | Pay_C | PAY1 | PAY2 | TOTAL_PAY | Accum_PAY_C | Accum_PAY1 | Accum_PAY2 | Accum_TOTAL_PAY |
173 | C1 | 01Jan2020 | 100 | 20 | 40 | 15 | 1000 | 0.1 | 1200 | 2000 | 20 | 0 | 0 | 20 | 20 | 0 | 0 | 20 |
173 | M | 03Apr2020 | 300 | 20 | 40 | 15 | 1000 | 0.1 | 1200 | 2000 | 0 | 300 | 0 | 300 | 20 | 300 | 0 | 320 |
173 | M | 24Apr2020 | 1300 | 20 | 40 | 15 | 1000 | 0.1 | 1200 | 2000 | 0 | 700 | 60 | 760 | 20 | 1000 | 60 | 1080 |
173 | M | 07Jun2020 | 500 | 20 | 40 | 15 | 1000 | 0.1 | 1200 | 2000 | 0 | 0 | 0 | 0 | 20 | 1000 | 60 | 1080 |
173 | M | 13Jun2020 | -1300 | 20 | 40 | 15 | 1000 | 0.1 | 1200 | 2000 | 0 | -700 | -60 | -760 | 20 | 300 | 0 | 320 |
173 | C1 | 28Jun2020 | 40 | 20 | 40 | 15 | 1000 | 0.1 | 1200 | 2000 | 20 | 0 | 0 | 20 | 40 | 0 | 0 | 340 |
173 | M | 07Jul2020 | 300 | 20 | 40 | 15 | 1000 | 0.1 | 1200 | 2000 | 0 | 300 | 0 | 300 | 40 | 600 | 0 | 640 |
173 | C2 | 05Aug2020 | 50 | 20 | 40 | 15 | 1000 | 0.1 | 1200 | 2000 | 40 | 0 | 0 | 400 | 80 | 600 | 0 | 640 |
173 | M | 20Aug2020 | 1500 | 20 | 40 | 15 | 1000 | 0.1 | 1200 | 2000 | 0 | 400 | 110 | 510 | 80 | 1000 | 110 | 1150 |
173 | C3 | 15Sep2020 | 30 | 20 | 40 | 15 | 1000 | 0.1 | 1200 | 2000 | 15 | 0 | 0 | 15 | 95 | 1000 | 110 | 1165 |
173 | C2 | 21Oct2020 | -500 | 20 | 40 | 15 | 1000 | 0.1 | 1200 | 2000 | -40 | 0 | 0 | -40 | 50 | 1000 | 110 | 1150 |
173 | C3 | 15Nov2020 | 30 | 20 | 40 | 15 | 1000 | 0.1 | 1200 | 2000 | 15 | 0 | 0 | 15 | 65 | 1000 | 100 | 1165 |
If you could explain the desired logic for the calculated fields, I might be able to make some suggestions. .
I know we can create cumulative sum using hash objects. Is there store them and iterate them over per each line?
Since you have decided to keep cumulative sums is discrete variables (one for each set of flags), there is no advantage to using a hash object, versus retaining sums for each group.
But if you were to use the same variable for each of them, then you would have a better reason for hash. In such a case the hash object would be keyed on the flag values.
As a general concept using some of your data organization, here's an example of what I mean. Below pay_grp replaces payc and pay1, and pay_cum replaces accum_payc and accum_pay1. But there will be no ambiguity in pay_grp and pay_cum in the output records because they will be accompanied by a flag value. This works because the hash object if keyed on a new variable flg_grp.
I know that I didn't program in code to honor the limits. They idea here is to propose a revision of the output data layout that would be a more likely candidate for using a hash object:
data HAVE;
input @1 ID @5 Flag $2. @8 Date DATE9. @18 Amount @24 C1 @27 C2 @30 C3 @33 PAY1_LIMIT @38 PAY2_DEC @42 PAY2_LIMIT @47 MAX_PAY ;
FORMAT DATE DATE9.;
datalines;
173 C1 01JAN2020 100 20 40 15 1000 0.1 1200 2000
173 M 03APR2020 300 20 40 15 1000 0.1 1200 2000
173 M 24APR2020 1300 20 40 15 1000 0.1 1200 2000
173 M 07JUN2020 500 20 40 15 1000 0.1 1200 2000
173 M 13JUN2020 -1300 20 40 15 1000 0.1 1200 2000
173 C1 28JUN2020 40 20 40 15 1000 0.1 1200 2000
173 M 07JUL2020 300 20 40 15 1000 0.1 1200 2000
173 C2 05AUG2020 50 20 40 15 1000 0.1 1200 2000
173 M 20AUG2020 1500 20 40 15 1000 0.1 1200 2000
173 C3 15SEP2020 30 20 40 15 1000 0.1 1200 2000
173 M 21OCT2020 -500 20 40 15 1000 0.1 1200 2000
173 C3 15NOV2020 30 20 40 15 1000 0.1 1200 2000
;
RUN;
data want;
set have;
length flg_group $1;
flg_group=flag; /* "C1", "C2", "C3" all become "C", "M"==>"M"*/
if _n_=1 then do;
declare hash h ();
h.definekey('flg_group');
h.definedata('flg_group','flag','pay_grp','pay_cum');
h.definedone();
end;
if h.find()^=0 then pay_cum=0; /*If not ready in hash h */
select(flag);
when ('c1') pay_grp=c1;
when ('c2') pay_grp=c2;
when ('c3') pay_grp=c3;
when ('m') pay_grp=pay1;
otherwise;
end;
pay_cum=pay_cum+pay_grp;
h.replace();
run;
It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.