Greetings,
I have a complex financial dataset that has three, or even more, robust examples. These examples are shown below in the Have and Want datasets. Please note that for the last example, ID 32145 which shifts to 32145DF once in default, the amount owed is reduced from $300 to $200 because the case was forwarded to default, where it reopens with a) a new ID that ends with DF and b) reduced the principal to the previous principal minus any collections to date. Thus, there should be data for only 32145CN and not both 32145 and 32145DF. The rationale is that it is the same ID/Customer and should not have the ID twice to double count the principal amounts.
Have:
ID | Date | Status | Collection date | Amount owed | ever collected |
1234 | 29-Feb-16 | Open | . | 82.44 | 0 |
1234 | 31-Mar-16 | Open | . | 82.44 | 0 |
1234 | 30-Apr-16 | Open | . | 82.44 | 0 |
1234 | 31-May-16 | Open | . | 82.44 | 0 |
1234 | 30-Jun-16 | Open | . | 82.44 | 0 |
1234 | 31-Jul-16 | Closed | 20-Jul-16 | 82.44 | 82.44 |
1234 | 31-May-18 | reopened | 20-Jul-16 | 82.44 | 82.44 |
564789 | 29-Feb-16 | Open | 02-Feb-16 | 100 | 10 |
564789 | 31-Mar-16 | Open | 02-Feb-16 | 100 | 10 |
564789 | 30-Apr-16 | Open | 04-Apr-16 | 100 | 30 |
564789 | 31-May-16 | Open | 15-May-16 | 100 | 40 |
564789 | 30-Jun-16 | Open | 12-Jun-16 | 100 | 70 |
564789 | 31-Jul-16 | Open | 12-Jun-16 | 100 | 70 |
564789 | 31-May-18 | Open | 22-May-18 | 100 | 100 |
32145 | 29-Feb-16 | Open | . | 300 | 0 |
32145 | 31-Mar-16 | Open | . | 300 | 55 |
32145 | 30-Apr-16 | Open | 15-Apr-16 | 300 | 100 |
32145 | 31-May-16 | Open | 15-Apr-16 | 300 | 100 |
32145DF | 31-May-16 | default | 12-May-16 | 200 | 120 |
32145DF | 30-Jun-16 | default | 06-Jun-16 | 200 | 130 |
Want:
ID | Date | Status | Collection date | Amount owed | ever collected | this month amount |
1234 | 29-Feb-16 | Open | . | 82.44 | 0 | 0 |
1234 | 31-Mar-16 | Open | . | 82.44 | 0 | 0 |
1234 | 30-Apr-16 | Open | . | 82.44 | 0 | 0 |
1234 | 31-May-16 | Open | . | 82.44 | 0 | 0 |
1234 | 30-Jun-16 | Open | . | 82.44 | 0 | 0 |
1234 | 31-Jul-16 | Closed | 20-Jul-16 | 82.44 | 82.44 | 82.44 |
1234 | 31-May-18 | reopened | 20-Jul-16 | 82.44 | 82.44 | 0 |
564789 | 29-Feb-16 | Open | 02-Feb-16 | 100 | 10 | 10 |
564789 | 31-Mar-16 | Open | 02-Feb-16 | 100 | 10 | 0 |
564789 | 30-Apr-16 | Open | 04-Apr-16 | 100 | 30 | 20 |
564789 | 31-May-16 | Open | 15-May-16 | 100 | 40 | 10 |
564789 | 30-Jun-16 | Open | 12-Jun-16 | 100 | 70 | 30 |
564789 | 31-Jul-16 | Open | 12-Jun-16 | 100 | 70 | 0 |
564789 | 31-May-18 | Open | 22-May-18 | 100 | 100 | 30 |
32145CN | 31-May-16 | default | 12-May-16 | 300 | 120 | 20 |
32145CN | 30-Jun-16 | default | 06-Jun-16 | 300 | 130 | 10 |
Thank you
Try this:
data have;
input classification ID :$10. Date :date7. Status :$20. Collectiondate :date7. Amountowed evercollected;
format Date Collectiondate date7.;
cards;
1 1234 29-Feb-16 Open . 82.44 0
1 1234 31-Mar-16 Open . 82.44 0
1 1234 30-Apr-16 Open . 82.44 0
1 1234 31-May-16 Open . 82.44 0
1 1234 30-Jun-16 Open . 82.44 0
1 1234 31-Jul-16 Closed 20-Jul-16 82.44 82.44
1 1234 31-May-18 reopened 20-Jul-16 82.44 82.44
2 564789 29-Feb-16 Open 02-Feb-16 100 10
2 564789 31-Mar-16 Open 02-Feb-16 100 10
2 564789 30-Apr-16 Open 04-Apr-16 100 30
2 564789 31-May-16 Open 15-May-16 100 40
2 564789 30-Jun-16 Open 12-Jun-16 100 70
2 564789 31-Jul-16 Open 12-Jun-16 100 70
2 564789 31-May-18 Open 22-May-18 100 100
3 32145 29-Feb-16 Open . 300 0
3 32145 31-Mar-16 Open . 300 55
3 32145 30-Apr-16 Open 15-Apr-16 300 100
3 32145 31-May-16 Open 15-Apr-16 300 100
3 32145DF 31-May-16 default 12-May-16 200 120
3 32145DF 30-Jun-16 default 06-Jun-16 200 130
4 91254DF 31-Mar-16 default 02-Mar-16 520 0
4 91254DF 30-Apr-16 default 06-Apr-16 520 50
4 91254DF 31-May-16 default 17-May-16 520 120
5 4105 31-May-16 Open . 1250 0
;
run;
data _have;
set have;
_id=compress(id,'DF');
run;
data want;
call missing(_f1,_f2);
do until(last._id);
set _have;
by classification _ID;
if anyalpha(id)>0 then _f1=1;
if anyalpha(id)=0 then _f2=1;
end;
call missing(_k1);
do until(last._id);
set _have;
by classification _ID;
thismonthamount=dif(evercollected);
if first._id then thismonthamount=evercollected;
if _f1 and _f2 then do;
if status ne 'default' then _k1=Amountowed;
else if status='default' then do; Amountowed=_k1;output;end;
end;
else output;
end;
drop _:;
run;
Not sure I understood you req well, but anyway:
data have;
input ID $ Date :date7. Status :$10. Collectiondate :date7. Amountowed evercollected;
format date Collectiondate date9.;
cards;
1234 29-Feb-16 Open . 82.44 0
1234 31-Mar-16 Open . 82.44 0
1234 30-Apr-16 Open . 82.44 0
1234 31-May-16 Open . 82.44 0
1234 30-Jun-16 Open . 82.44 0
1234 31-Jul-16 Closed 20-Jul-16 82.44 82.44
1234 31-May-18 reopened 20-Jul-16 82.44 82.44
564789 29-Feb-16 Open 2-Feb-16 100 10
564789 31-Mar-16 Open 2-Feb-16 100 10
564789 30-Apr-16 Open 4-Apr-16 100 30
564789 31-May-16 Open 15-May-16 100 40
564789 30-Jun-16 Open 12-Jun-16 100 70
564789 31-Jul-16 Open 12-Jun-16 100 70
564789 31-May-18 Open 22-May-18 100 100
32145 29-Feb-16 Open . 300 0
32145 31-Mar-16 Open . 300 55
32145 30-Apr-16 Open 15-Apr-16 300 100
32145 31-May-16 Open 15-Apr-16 300 100
32145DF 31-May-16 default 12-May-16 200 120
32145DF 30-Jun-16 default 6-Jun-16 200 130
;
data _have;
set have;
_id=compress(id,'DF');
run;
data want;
call missing(_f);
do until(last._id);
set _have;
by _id notsorted;
if status='default' then _f=1;
end;
call missing(k1);
do until(last._id);
set _have;
by _id notsorted;
thismonthamount=dif(evercollected);
if first._id then thismonthamount=evercollected;
if _f then do;
if status='Open' then k1=Amountowed;
else if status='default' then do; Amountowed=k1;output;end;
end;
else output;
end;
drop _: k1;
run;
Thanks novinosrin. I tried to apply your code to the data and it does not give the outcome I am looking for.
Simply, I want to have the following outputs for each of the three examples:
Example 1234: “this month amount” should show only the amount collected this month, which is $0. The 82.44 was collected back in July/2016, but the ever collect show the cumulative of all collections.
Example 564789: Same is true for this one too. Collections this month should show what was collected each month and not the cumulative.
Example 32145 and 32145DF: amount owed to be the same as the amounts before sending the ID to default (DF). So for ID 32145DF, I need it to show the original amount owed ($300) instead of the current $200. Same with the collection. It should show the total amounts collected each month including the amounts collected at the non-DF periods. In addition, I don’t want to have the ID 32145 in the data at all, because it will double-count the amount.
Hope this clarifies
Hi @altijani I think I have understood much better this time. I am willing to work interactively and make any changes required regardless.
Please test and let me know for further edit/changes. Thanks
data have;
input ID $ Date :date7. Status :$10. Collectiondate :date7. Amountowed evercollected;
format date Collectiondate date9.;
cards;
1234 29-Feb-16 Open . 82.44 0
1234 31-Mar-16 Open . 82.44 0
1234 30-Apr-16 Open . 82.44 0
1234 31-May-16 Open . 82.44 0
1234 30-Jun-16 Open . 82.44 0
1234 31-Jul-16 Closed 20-Jul-16 82.44 82.44
1234 31-May-18 reopened 20-Jul-16 82.44 82.44
564789 29-Feb-16 Open 2-Feb-16 100 10
564789 31-Mar-16 Open 2-Feb-16 100 10
564789 30-Apr-16 Open 4-Apr-16 100 30
564789 31-May-16 Open 15-May-16 100 40
564789 30-Jun-16 Open 12-Jun-16 100 70
564789 31-Jul-16 Open 12-Jun-16 100 70
564789 31-May-18 Open 22-May-18 100 100
32145 29-Feb-16 Open . 300 0
32145 31-Mar-16 Open . 300 55
32145 30-Apr-16 Open 15-Apr-16 300 100
32145 31-May-16 Open 15-Apr-16 300 100
32145DF 31-May-16 default 12-May-16 200 120
32145DF 30-Jun-16 default 6-Jun-16 200 130
;
data _have;
set have;
_id=compress(id,'DF');
run;
data want;
call missing(_f);
do until(last._id);
set _have;
by _id notsorted;
if status='default' then _f=1;
end;
call missing(_k1);
do until(last._id);
set _have;
by _id notsorted;
_temp=dif(evercollected);
if missing(Collectiondate) then thismonthamount=0;
else if month(Collectiondate)=month(date) and year(date)=year(Collectiondate) then thismonthamount=_temp;
else thismonthamount=0;
if first._id then thismonthamount=evercollected;
if _f then do;
if status='Open' then _k1=Amountowed;
else if status='default' then do; Amountowed=_k1;output;end;
end;
else output;
end;
drop _: ;
run;
Thanks again @novinosrin
Still not getting the results I want. Let us use a different example. I have the following dataset:
id | owed | paid |
1234 | $100 | $50 |
1234 | $100 | $100 |
65478 | $88 | $28 |
65478DF | $60 | $- |
And I want the following dataset:
id | owed | paid |
1234 | $100 | $50 |
1234 | $100 | $100 |
65478 | $88 | $28 |
65478DF | $88 | $28 |
It is important to have the amounts owed and paid for the last ID with the DF equals the amounts owed and paid from the previous ID without the DF.
Hope this clarifies
I think we are missing something that either your sample is perhaps not clear representative of your real?
data have;
input id $ owed : comma10. paid : comma10.;
cards;
1234 $100 $50
1234 $100 $100
65478 $88 $28
65478DF $60 .
;
data _have;
set have;
_id=compress(id,'DF');
run;
data want;
set _have;
by _id notsorted;
retain _owed _paid;
if first._id then call missing(_owed,_paid);
if index(id,'DF')=0 then do;
_owed=owed;
_paid=paid;
end;
else if index(id,'DF')>0 then do;
owed=_owed;
paid=_paid;
end;
drop _:;
run;
thanks for the code @novinosrin
However, I still have a problem with the actual data, which is much more obs than this mock data. The problem is that there is a significant number of observations that have both owed and paid equaling missing.
Any idea of what might be the reason for that?
Thanks,
Altijani
@altijani Thank you for the message. I am afraid, we can only help each other if you could make up a great mock p data that represents your real. Change the names and values with an alternative equivalent so that it is easy for us to test. I would request you to spend some time in getting the same right that hopefully is very "alike" your real.
Sure @novinosrin. Let us try to give it another try with all possible scenarios.
Here is what I have:
Have | ||||||
calssification | ID | Date | Status | Collection date | Amount owed | ever collected |
1 | 1234 | 29-Feb-16 | Open | . | 82.44 | 0 |
1 | 1234 | 31-Mar-16 | Open | . | 82.44 | 0 |
1 | 1234 | 30-Apr-16 | Open | . | 82.44 | 0 |
1 | 1234 | 31-May-16 | Open | . | 82.44 | 0 |
1 | 1234 | 30-Jun-16 | Open | . | 82.44 | 0 |
1 | 1234 | 31-Jul-16 | Closed | 20-Jul-16 | 82.44 | 82.44 |
1 | 1234 | 31-May-18 | reopened | 20-Jul-16 | 82.44 | 82.44 |
2 | 564789 | 29-Feb-16 | Open | 02-Feb-16 | 100 | 10 |
2 | 564789 | 31-Mar-16 | Open | 02-Feb-16 | 100 | 10 |
2 | 564789 | 30-Apr-16 | Open | 04-Apr-16 | 100 | 30 |
2 | 564789 | 31-May-16 | Open | 15-May-16 | 100 | 40 |
2 | 564789 | 30-Jun-16 | Open | 12-Jun-16 | 100 | 70 |
2 | 564789 | 31-Jul-16 | Open | 12-Jun-16 | 100 | 70 |
2 | 564789 | 31-May-18 | Open | 22-May-18 | 100 | 100 |
3 | 32145 | 29-Feb-16 | Open | . | 300 | 0 |
3 | 32145 | 31-Mar-16 | Open | . | 300 | 55 |
3 | 32145 | 30-Apr-16 | Open | 15-Apr-16 | 300 | 100 |
3 | 32145 | 31-May-16 | Open | 15-Apr-16 | 300 | 100 |
3 | 32145DF | 31-May-16 | default | 12-May-16 | 200 | 120 |
3 | 32145DF | 30-Jun-16 | default | 06-Jun-16 | 200 | 130 |
4 | 91254DF | 31-Mar-16 | default | 02-Mar-16 | 520 | 0 |
4 | 91254DF | 30-Apr-16 | default | 06-Apr-16 | 520 | 50 |
4 | 91254DF | 31-May-16 | default | 17-May-16 | 520 | 120 |
5 | 4105 | 31-May-16 | Open | . | 1250 | 0 |
Please note that
Classification 4 starts with a DF ID, and does not have a non-DF ID
Classification 5: is a new ID with no collection
Her is what I want:
calssification | ID | Date | Status | Collection date | Amount owed | ever collected | this month amount |
1 | 1234 | 29-Feb-16 | Open | . | 82.44 | 0 | 0 |
1 | 1234 | 31-Mar-16 | Open | . | 82.44 | 0 | 0 |
1 | 1234 | 30-Apr-16 | Open | . | 82.44 | 0 | 0 |
1 | 1234 | 31-May-16 | Open | . | 82.44 | 0 | 0 |
1 | 1234 | 30-Jun-16 | Open | . | 82.44 | 0 | 0 |
1 | 1234 | 31-Jul-16 | Closed | 20-Jul-16 | 82.44 | 82.44 | 82.44 |
1 | 1234 | 31-May-18 | reopened | 20-Jul-16 | 82.44 | 82.44 | 0 |
2 | 564789 | 29-Feb-16 | Open | 02-Feb-16 | 100 | 10 | 10 |
2 | 564789 | 31-Mar-16 | Open | 02-Feb-16 | 100 | 10 | 0 |
2 | 564789 | 30-Apr-16 | Open | 04-Apr-16 | 100 | 30 | 20 |
2 | 564789 | 31-May-16 | Open | 15-May-16 | 100 | 40 | 10 |
2 | 564789 | 30-Jun-16 | Open | 12-Jun-16 | 100 | 70 | 30 |
2 | 564789 | 31-Jul-16 | Open | 12-Jun-16 | 100 | 70 | 0 |
2 | 564789 | 31-May-18 | Open | 22-May-18 | 100 | 100 | 30 |
3 | 32145DF | 31-May-16 | default | 12-May-16 | 300 | 120 | 20 |
3 | 32145DF | 30-Jun-16 | default | 06-Jun-16 | 300 | 130 | 10 |
4 | 91254DF | 31-Mar-16 | default | . | 520 | 0 | 0 |
4 | 91254DF | 30-Apr-16 | default | 06-Apr-16 | 520 | 50 | 50 |
4 | 91254DF | 31-May-16 | default | 17-May-16 | 520 | 120 | 70 |
5 | 4105 | 31-May-16 | Open | 20-May-16 | 1250 | 0 | 0 |
Hope this clarifies
Try this:
data have;
input classification ID :$10. Date :date7. Status :$20. Collectiondate :date7. Amountowed evercollected;
format Date Collectiondate date7.;
cards;
1 1234 29-Feb-16 Open . 82.44 0
1 1234 31-Mar-16 Open . 82.44 0
1 1234 30-Apr-16 Open . 82.44 0
1 1234 31-May-16 Open . 82.44 0
1 1234 30-Jun-16 Open . 82.44 0
1 1234 31-Jul-16 Closed 20-Jul-16 82.44 82.44
1 1234 31-May-18 reopened 20-Jul-16 82.44 82.44
2 564789 29-Feb-16 Open 02-Feb-16 100 10
2 564789 31-Mar-16 Open 02-Feb-16 100 10
2 564789 30-Apr-16 Open 04-Apr-16 100 30
2 564789 31-May-16 Open 15-May-16 100 40
2 564789 30-Jun-16 Open 12-Jun-16 100 70
2 564789 31-Jul-16 Open 12-Jun-16 100 70
2 564789 31-May-18 Open 22-May-18 100 100
3 32145 29-Feb-16 Open . 300 0
3 32145 31-Mar-16 Open . 300 55
3 32145 30-Apr-16 Open 15-Apr-16 300 100
3 32145 31-May-16 Open 15-Apr-16 300 100
3 32145DF 31-May-16 default 12-May-16 200 120
3 32145DF 30-Jun-16 default 06-Jun-16 200 130
4 91254DF 31-Mar-16 default 02-Mar-16 520 0
4 91254DF 30-Apr-16 default 06-Apr-16 520 50
4 91254DF 31-May-16 default 17-May-16 520 120
5 4105 31-May-16 Open . 1250 0
;
run;
data _have;
set have;
_id=compress(id,'DF');
run;
data want;
call missing(_f1,_f2);
do until(last._id);
set _have;
by classification _ID;
if anyalpha(id)>0 then _f1=1;
if anyalpha(id)=0 then _f2=1;
end;
call missing(_k1);
do until(last._id);
set _have;
by classification _ID;
thismonthamount=dif(evercollected);
if first._id then thismonthamount=evercollected;
if _f1 and _f2 then do;
if status ne 'default' then _k1=Amountowed;
else if status='default' then do; Amountowed=_k1;output;end;
end;
else output;
end;
drop _:;
run;
Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!
Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.