Hello, I am trying to prepare an extrapolated modeling dataset, which is quite challenging because there are overlapping periods. I have written a query below, and I am close, but could you please help me reach the attached target table?
DATA HAVE;
LENGTH ID 8 APP_DATE 8 RATING $ 10 DELINQUENCY 8 BALANCE 8 ;
INFILE DATALINES MISSOVER DLM=",";
INPUT ID APP_DATE RATING DELINQUENCY BALANCE;
FORMAT APP_DATE DATE9.;
DATALINES;
1,20027,BB+,0,10000
1,20119,BB,30,9000
1,20635,BB-,60,8000
1,20939,Default,91,5000
2,19023,BB-,30,7000
2,19509,BB,0,6000
2,20027,BB-,30,4000
2,20147,Default,91,3000
RUN;
DATA INTERMEDIATE;
SET HAVE;
FORMAT DATE_OF_DEFAULT DATE9.;
IF RATING = 'Default' THEN DATE_OF_DEFAULT=INTNX("month",APP_DATE,0,"E");
RUN;
PROC SORT DATA=INTERMEDIATE;
BY ID DESCENDING DATE_OF_DEFAULT APP_DATE;
RUN;
DATA WANTED;
SET INTERMEDIATE;
BY ID DESCENDING DATE_OF_DEFAULT APP_DATE;
RETAIN DATE_OF_DEFAULT_;
FORMAT DATE_OF_DEFAULT_ DEFAULT_W12MNTH DEFAULT_W24MNTH COHORT_DATE DATE9.;
IF NOT MISSING(DATE_OF_DEFAULT) THEN
DATE_OF_DEFAULT_=DATE_OF_DEFAULT;
ELSE DATE_OF_DEFAULT=DATE_OF_DEFAULT_;
MONTH_DIFFERENCE=intck("MONTH", APP_DATE, DATE_OF_DEFAULT);
DEFAULT_W12MNTH= INTNX("MONTH",DATE_OF_DEFAULT ,-12,'S');
DEFAULT_W24MNTH= INTNX("MONTH",DATE_OF_DEFAULT ,-24,'S');
IF MONTH_DIFFERENCE>0 THEN
DO;
DO I=0 TO MONTH_DIFFERENCE;
COHORT_DATE= INTNX("MONTH",APP_DATE ,I,'E');
IF COHORT_DATE > DEFAULT_W12MNTH THEN
DEFAULT_FLG12MNTH=1;
ELSE DEFAULT_FLG12MNTH=0;
IF COHORT_DATE > DEFAULT_W24MNTH THEN
DEFAULT_FLG24=1;
ELSE DEFAULT_FLG24=0;
OUTPUT;
END;
END;
ELSE
DO;
COHORT_DATE=DATE_OF_DEFAULT;
OUTPUT;
END;
DROP DATE_OF_DEFAULT_ I MONTH_DIFFERENCE DEFAULT_W12MNTH DEFAULT_W24MNTH;
RUN;
PROC SORT DATA=WANTED ;
BY ID APP_DATE DATE_OF_DEFAULT ;
RUN;
Please give us at least a hint of what you need help with. Which variable(s)? for which observations. And the rules needed.
Providing a picture is extremely hard to follow what is wrong without some description of where to look.
You're right, it's a bit complicated. Let me try to explain it in more detail.
The overwhelming superiority should be towards the dates that match (or overlap) with the greater APP_DATE rating in descending order, with the aim of always keeping the most up-to-date information in the data. As you can see in my result view, the larger APP_DATE values are overpowering the records that overlap with them (for example, APP_DATE dated 31JAN2015 should not be displayed for the period from 31MAR2017, to 30JUN2016(Cause overlapping by APP_DATE 30JUN2016). Instead, APP_DATE from 30JUN2016, should be displayed)
Moreover, starting from the default moment, there should be 12 and 24-month Default_Flag variables for past periods
I think if you run my code and compare my desired output, you will understand what I am trying to do(or what is the issue) 👍
It seems like a lot of code, and I’m not exactly sure what it does, or why the table is missing an observation with a ‘Default’ .
Can you maybe take a step back and explain what your table should have (vs not have)? Start from scratch.
Maybe it should look like HAVE, except:
Add in additional requirements, if any.
Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.