Hello, I am trying to prepare an extrapolated modeling dataset, which is quite challenging because there are overlapping periods. I have written a query below, and I am close, but could you please help me reach the attached target table?
DATA HAVE;
LENGTH ID 8 APP_DATE 8 RATING $ 10 DELINQUENCY 8 BALANCE 8 ;
INFILE DATALINES MISSOVER DLM=",";
INPUT ID APP_DATE RATING DELINQUENCY BALANCE;
FORMAT APP_DATE DATE9.;
DATALINES;
1,20027,BB+,0,10000
1,20119,BB,30,9000
1,20635,BB-,60,8000
1,20939,Default,91,5000
2,19023,BB-,30,7000
2,19509,BB,0,6000
2,20027,BB-,30,4000
2,20147,Default,91,3000
RUN;
DATA INTERMEDIATE;
SET HAVE;
FORMAT DATE_OF_DEFAULT DATE9.;
IF RATING = 'Default' THEN DATE_OF_DEFAULT=INTNX("month",APP_DATE,0,"E");
RUN;
PROC SORT DATA=INTERMEDIATE;
BY ID DESCENDING DATE_OF_DEFAULT APP_DATE;
RUN;
DATA WANTED;
SET INTERMEDIATE;
BY ID DESCENDING DATE_OF_DEFAULT APP_DATE;
RETAIN DATE_OF_DEFAULT_;
FORMAT DATE_OF_DEFAULT_ DEFAULT_W12MNTH DEFAULT_W24MNTH COHORT_DATE DATE9.;
IF NOT MISSING(DATE_OF_DEFAULT) THEN
DATE_OF_DEFAULT_=DATE_OF_DEFAULT;
ELSE DATE_OF_DEFAULT=DATE_OF_DEFAULT_;
MONTH_DIFFERENCE=intck("MONTH", APP_DATE, DATE_OF_DEFAULT);
DEFAULT_W12MNTH= INTNX("MONTH",DATE_OF_DEFAULT ,-12,'S');
DEFAULT_W24MNTH= INTNX("MONTH",DATE_OF_DEFAULT ,-24,'S');
IF MONTH_DIFFERENCE>0 THEN
DO;
DO I=0 TO MONTH_DIFFERENCE;
COHORT_DATE= INTNX("MONTH",APP_DATE ,I,'E');
IF COHORT_DATE > DEFAULT_W12MNTH THEN
DEFAULT_FLG12MNTH=1;
ELSE DEFAULT_FLG12MNTH=0;
IF COHORT_DATE > DEFAULT_W24MNTH THEN
DEFAULT_FLG24=1;
ELSE DEFAULT_FLG24=0;
OUTPUT;
END;
END;
ELSE
DO;
COHORT_DATE=DATE_OF_DEFAULT;
OUTPUT;
END;
DROP DATE_OF_DEFAULT_ I MONTH_DIFFERENCE DEFAULT_W12MNTH DEFAULT_W24MNTH;
RUN;
PROC SORT DATA=WANTED ;
BY ID APP_DATE DATE_OF_DEFAULT ;
RUN;
Please give us at least a hint of what you need help with. Which variable(s)? for which observations. And the rules needed.
Providing a picture is extremely hard to follow what is wrong without some description of where to look.
You're right, it's a bit complicated. Let me try to explain it in more detail.
The overwhelming superiority should be towards the dates that match (or overlap) with the greater APP_DATE rating in descending order, with the aim of always keeping the most up-to-date information in the data. As you can see in my result view, the larger APP_DATE values are overpowering the records that overlap with them (for example, APP_DATE dated 31JAN2015 should not be displayed for the period from 31MAR2017, to 30JUN2016(Cause overlapping by APP_DATE 30JUN2016). Instead, APP_DATE from 30JUN2016, should be displayed)
Moreover, starting from the default moment, there should be 12 and 24-month Default_Flag variables for past periods
I think if you run my code and compare my desired output, you will understand what I am trying to do(or what is the issue) 👍
It seems like a lot of code, and I’m not exactly sure what it does, or why the table is missing an observation with a ‘Default’ .
Can you maybe take a step back and explain what your table should have (vs not have)? Start from scratch.
Maybe it should look like HAVE, except:
Add in additional requirements, if any.
Registration is open! SAS is returning to Vegas for an AI and analytics experience like no other! Whether you're an executive, manager, end user or SAS partner, SAS Innovate is designed for everyone on your team. Register for just $495 by 12/31/2023.
If you are interested in speaking, there is still time to submit a session idea. More details are posted on the website.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.