Hello everyone,
This problem bothered me a lot, could you have a look and advice on how I can solve this problem? thanks!
I have a dataset which has the structure as follows:
ID is the customer ID that has default;
open_month means = -60 means the bank account is opened before 60 months, the value of this column is from -60 to 0;
MOB is the month on book, the value is from 0-60;
I'm trying to do a vintage analysis, so I need to calculate the accumulated number of default customers by open month and MOB, the structure of the output dataset should be open_month, MOB, count of accumulated distinct customer ID.
For example, for the combination of open_month = -60 and MOB = 10 should count all distinct ID where open_month = -60 and MOB <=10, hope I describe it clearly.
ID | open_month | MOB |
5069216 | -60 | 5 |
5069216 | -60 | 6 |
5023640 | -60 | 7 |
5023644 | -60 | 7 |
5069216 | -60 | 7 |
5069216 | -60 | 9 |
5116761 | -60 | 9 |
5069216 | -60 | 10 |
5059854 | -60 | 11 |
5069216 | -60 | 11 |
5116761 | -60 | 11 |
5002283 | -60 | 12 |
5059854 | -60 | 12 |
5069216 | -60 | 12 |
5116761 | -60 | 12 |
5059854 | -60 | 13 |
5085886 | -60 | 13 |
5116761 | -60 | 13 |
5085886 | -60 | 14 |
5116761 | -60 | 14 |
5085886 | -60 | 15 |
5085886 | -60 | 16 |
5085886 | -60 | 17 |
5085886 | -60 | 18 |
5085886 | -60 | 19 |
5085886 | -60 | 20 |
5065314 | -60 | 21 |
5085886 | -60 | 21 |
5065314 | -60 | 22 |
5085886 | -60 | 22 |
5065314 | -60 | 23 |
5085886 | -60 | 23 |
5065314 | -60 | 24 |
5085886 | -60 | 24 |
5060452 | -60 | 25 |
5065314 | -60 | 25 |
5085886 | -60 | 25 |
Your kindly help is highly appreciated!
Best Regards
Judy
A quick and dirty version would be something like:
data expand;
set have;
do mobx = 0 to 60;
if mob <= mobx then output;
end;
run;
proc sort data=expand out=unique_combos (keep=id open_month mobx) nodupkey;
by open_month mobx id;
data want;
set unique_combos;
by open_month mobx;
retain count;
if first.mobx then count = 0;
count + 1;
if last.mobx then output;
run;
Basically this outputs all qualifying records by id/open_month/mobx, then restricts to at most one per ID, then counts the number of qualifying IDs. There are of course many ways to do it but this is a fairly simple and understandable method.
A quick and dirty version would be something like:
data expand;
set have;
do mobx = 0 to 60;
if mob <= mobx then output;
end;
run;
proc sort data=expand out=unique_combos (keep=id open_month mobx) nodupkey;
by open_month mobx id;
data want;
set unique_combos;
by open_month mobx;
retain count;
if first.mobx then count = 0;
count + 1;
if last.mobx then output;
run;
Basically this outputs all qualifying records by id/open_month/mobx, then restricts to at most one per ID, then counts the number of qualifying IDs. There are of course many ways to do it but this is a fairly simple and understandable method.
Thank you so much! You are brilliant!
I just add one step to remove records which mobx > -(open_month).
Thank you again! You made my day!
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.