Hello, I am trying to prepare an extrapolated modeling dataset, which is quite challenging because there are overlapping periods. I have written a query below, and I am close, but could you please help me reach the attached target table?
DATA HAVE; LENGTH ID 8 APP_DATE 8 RATING $ 10 DELINQUENCY 8 BALANCE 8 ; INFILE DATALINES MISSOVER DLM=","; INPUT ID APP_DATE RATING DELINQUENCY BALANCE; FORMAT APP_DATE DATE9.; DATALINES; 1,20027,BB+,0,10000 1,20119,BB,30,9000 1,20635,BB-,60,8000 1,20939,Default,91,5000 2,19023,BB-,30,7000 2,19509,BB,0,6000 2,20027,BB-,30,4000 2,20147,Default,91,3000 RUN; DATA INTERMEDIATE; SET HAVE; FORMAT DATE_OF_DEFAULT DATE9.; IF RATING = 'Default' THEN DATE_OF_DEFAULT=INTNX("month",APP_DATE,0,"E"); RUN; PROC SORT DATA=INTERMEDIATE; BY ID DESCENDING DATE_OF_DEFAULT APP_DATE; RUN; DATA WANTED; SET INTERMEDIATE; BY ID DESCENDING DATE_OF_DEFAULT APP_DATE; RETAIN DATE_OF_DEFAULT_; FORMAT DATE_OF_DEFAULT_ DEFAULT_W12MNTH DEFAULT_W24MNTH COHORT_DATE DATE9.; IF NOT MISSING(DATE_OF_DEFAULT) THEN DATE_OF_DEFAULT_=DATE_OF_DEFAULT; ELSE DATE_OF_DEFAULT=DATE_OF_DEFAULT_; MONTH_DIFFERENCE=intck("MONTH", APP_DATE, DATE_OF_DEFAULT); DEFAULT_W12MNTH= INTNX("MONTH",DATE_OF_DEFAULT ,-12,'S'); DEFAULT_W24MNTH= INTNX("MONTH",DATE_OF_DEFAULT ,-24,'S'); IF MONTH_DIFFERENCE>0 THEN DO; DO I=0 TO MONTH_DIFFERENCE; COHORT_DATE= INTNX("MONTH",APP_DATE ,I,'E'); IF COHORT_DATE > DEFAULT_W12MNTH THEN DEFAULT_FLG12MNTH=1; ELSE DEFAULT_FLG12MNTH=0; IF COHORT_DATE > DEFAULT_W24MNTH THEN DEFAULT_FLG24=1; ELSE DEFAULT_FLG24=0; OUTPUT; END; END; ELSE DO; COHORT_DATE=DATE_OF_DEFAULT; OUTPUT; END; DROP DATE_OF_DEFAULT_ I MONTH_DIFFERENCE DEFAULT_W12MNTH DEFAULT_W24MNTH; RUN; PROC SORT DATA=WANTED ; BY ID APP_DATE DATE_OF_DEFAULT ; RUN;
You're right, it's a bit complicated. Let me try to explain it in more detail.
The overwhelming superiority should be towards the dates that match (or overlap) with the greater APP_DATE rating in descending order, with the aim of always keeping the most up-to-date information in the data. As you can see in my result view, the larger APP_DATE values are overpowering the records that overlap with them (for example, APP_DATE dated 31JAN2015 should not be displayed for the period from 31MAR2017, to 30JUN2016(Cause overlapping by APP_DATE 30JUN2016). Instead, APP_DATE from 30JUN2016, should be displayed)
Moreover, starting from the default moment, there should be 12 and 24-month Default_Flag variables for past periods
It seems like a lot of code, and I’m not exactly sure what it does, or why the table is missing an observation with a ‘Default’ .
Can you maybe take a step back and explain what your table should have (vs not have)? Start from scratch.
Maybe it should look like HAVE, except:
Add in additional requirements, if any.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.