Hello everyone,
This problem bothered me a lot, could you have a look and advice on how I can solve this problem? thanks!
I have a dataset which has the structure as follows:
ID is the customer ID that has default;
open_month means = -60 means the bank account is opened before 60 months, the value of this column is from -60 to 0;
MOB is the month on book, the value is from 0-60;
I'm trying to do a vintage analysis, so I need to calculate the accumulated number of default customers by open month and MOB, the structure of the output dataset should be open_month, MOB, count of accumulated distinct customer ID.
For example, for the combination of open_month = -60 and MOB = 10 should count all distinct ID where open_month = -60 and MOB <=10, hope I describe it clearly.
ID | open_month | MOB |
5069216 | -60 | 5 |
5069216 | -60 | 6 |
5023640 | -60 | 7 |
5023644 | -60 | 7 |
5069216 | -60 | 7 |
5069216 | -60 | 9 |
5116761 | -60 | 9 |
5069216 | -60 | 10 |
5059854 | -60 | 11 |
5069216 | -60 | 11 |
5116761 | -60 | 11 |
5002283 | -60 | 12 |
5059854 | -60 | 12 |
5069216 | -60 | 12 |
5116761 | -60 | 12 |
5059854 | -60 | 13 |
5085886 | -60 | 13 |
5116761 | -60 | 13 |
5085886 | -60 | 14 |
5116761 | -60 | 14 |
5085886 | -60 | 15 |
5085886 | -60 | 16 |
5085886 | -60 | 17 |
5085886 | -60 | 18 |
5085886 | -60 | 19 |
5085886 | -60 | 20 |
5065314 | -60 | 21 |
5085886 | -60 | 21 |
5065314 | -60 | 22 |
5085886 | -60 | 22 |
5065314 | -60 | 23 |
5085886 | -60 | 23 |
5065314 | -60 | 24 |
5085886 | -60 | 24 |
5060452 | -60 | 25 |
5065314 | -60 | 25 |
5085886 | -60 | 25 |
Your kindly help is highly appreciated!
Best Regards
Judy
A quick and dirty version would be something like:
data expand;
set have;
do mobx = 0 to 60;
if mob <= mobx then output;
end;
run;
proc sort data=expand out=unique_combos (keep=id open_month mobx) nodupkey;
by open_month mobx id;
data want;
set unique_combos;
by open_month mobx;
retain count;
if first.mobx then count = 0;
count + 1;
if last.mobx then output;
run;
Basically this outputs all qualifying records by id/open_month/mobx, then restricts to at most one per ID, then counts the number of qualifying IDs. There are of course many ways to do it but this is a fairly simple and understandable method.
A quick and dirty version would be something like:
data expand;
set have;
do mobx = 0 to 60;
if mob <= mobx then output;
end;
run;
proc sort data=expand out=unique_combos (keep=id open_month mobx) nodupkey;
by open_month mobx id;
data want;
set unique_combos;
by open_month mobx;
retain count;
if first.mobx then count = 0;
count + 1;
if last.mobx then output;
run;
Basically this outputs all qualifying records by id/open_month/mobx, then restricts to at most one per ID, then counts the number of qualifying IDs. There are of course many ways to do it but this is a fairly simple and understandable method.
Thank you so much! You are brilliant!
I just add one step to remove records which mobx > -(open_month).
Thank you again! You made my day!
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.