BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
anubhav85verma
Fluorite | Level 6

Hi Guys,

 

I have a typical banking data and need some help.

 

There are 3 columns: Account ID, Month Key(yyyymm format) and Payment Type. Payment Type can take values IO,IOA,PIF,PI,P,NFD,Null.

I have around 250,000 accounts and objective is to find such accounts that have Payment_Type in ("IO","IOA") for consecutive 60+ months. Discontinuous 60 months in IO is not my objective.

 

Account IDMonth KeyPayment Type
A1201001 
A1201002IO
A1201003PIF
A1201004PI
A1201005P
A1201006 
A1201007IOA
A1201008IO
A1201009IOA
A1201010IOA
A1201011IO
A1201012IO
A1201101IO
A1201102IO
A1201103IO
A1201104IO
A1201105IO
A1201106IO
A1201107IO
A1201108IO
A1201109IO
A1201110IO
A1201111IO
A1201112IO
A1201201IO
A1201202IO
A1201203IO
A1201204IO
A1201205IO
A1201206IO
A1201207IO
A1201208IO
A1201209IO
A1201210IO
A1201211IO
A1201212IO
A1201301IO
A1201302IO
A1201303IO
A1201304IO
A1201305IO
A1201306IO
A1201307IO
A1201308IO
A1201309IO
A1201310IO
A1201311IO
A1201312IO
A1201401IO
A1201402IO
A1201403IO
A1201404IO
A1201405IO
A1201406IO
A1201407IO
A1201408IO
A1201409IO
A1201410IO
A1201411IO
A1201412IO
A1201501IO
A1201502IO
A1201503IO
A1201504IO
A1201505IO
A1201506IO
A1201507IO
A1201508IO
A1201509PIF
A1201510PIF
A1201511PIF
A1201512PIF
A1201601PIF
A1201602PIF
A1201603PIF

 

This account is in IO for a period of 62 consecutive months starting from 201007 ending at 201508.

 

My final output should have Account ID, and an indicator stating whether account is in IO > 60+ Months. Better to create an indicator with value 1 such as below if 60+ months in IO, else 0.

 

Account_ID        IO_GT_60_Mths_Ind

A1                                   1

 

Can someone please help me. Appreciate!!

1 ACCEPTED SOLUTION

Accepted Solutions
mklangley
Lapis Lazuli | Level 10

@Patrick Thank you. Yes, my code may need some refinement. But your reformulation only outputs on the last Account_ID. What if there are 60+ consecutive IO months, then a month gap, then five more consecutive IO months, all for the same Account_ID? I believe your code would only output based on the last five consecutive months (not the 60+ consecutive IO months earlier on).

View solution in original post

7 REPLIES 7
mklangley
Lapis Lazuli | Level 10

Give this a try. I am assuming your data is sorted by Account_ID and Month_Type.

 

Note that I duplicated the A1 data, relabeled the Account_ID to A2, then removed a month in the middle (note the deliberate line break). That shows an ID for which there are not 60+ consecutive months with Payment_Type of IO or IOA.

data have;
    input Account_ID $ Month_Key :yymmn6. Payment_Type $3.;
    format month_key yymmn6.;
    datalines;
    A1 201001   
    A1 201002 IO
    A1 201003 PIF
    A1 201004 PI
    A1 201005 P
    A1 201006   
    A1 201007 IOA
    A1 201008 IO
    A1 201009 IOA
    A1 201010 IOA
    A1 201011 IO
    A1 201012 IO
    A1 201101 IO
    A1 201102 IO
    A1 201103 IO
    A1 201104 IO
    A1 201105 IO
    A1 201106 IO
    A1 201107 IO
    A1 201108 IO
    A1 201109 IO
    A1 201110 IO 
    A1 201111 IO
    A1 201112 IO
    A1 201201 IO
    A1 201202 IO
    A1 201203 IO
    A1 201204 IO
    A1 201205 IO
    A1 201206 IO
    A1 201207 IO
    A1 201208 IO
    A1 201209 IO
    A1 201210 IO
    A1 201211 IO
    A1 201212 IO
    A1 201301 IO
    A1 201302 IO 
    A1 201303 IO
    A1 201304 IO 
    A1 201305 IO
    A1 201306 IO 
    A1 201307 IO
    A1 201308 IO
    A1 201309 IO
    A1 201310 IO
    A1 201311 IO
    A1 201312 IO
    A1 201401 IO
    A1 201402 IO
    A1 201403 IO
    A1 201404 IO
    A1 201405 IO
    A1 201406 IO
    A1 201407 IO
    A1 201408 IO
    A1 201409 IO
    A1 201410 IO
    A1 201411 IO
    A1 201412 IO
    A1 201501 IO
    A1 201502 IO
    A1 201503 IO
    A1 201504 IO
    A1 201505 IO
    A1 201506 IO
    A1 201507 IO
    A1 201508 IO
    A1 201509 PIF
    A1 201510 PIF
    A1 201511 PIF
    A1 201512 PIF
    A1 201601 PIF
    A1 201602 PIF
    A1 201603 PIF
    A2 201007 IOA
    A2 201008 IO
    A2 201009 IOA
    A2 201010 IOA
    A2 201011 IO
    A2 201012 IO
    A2 201101 IO
    A2 201102 IO
    A2 201103 IO
    A2 201104 IO
    A2 201105 IO
    A2 201106 IO
    A2 201107 IO
    A2 201108 IO
    A2 201109 IO
    A2 201110 IO 
    A2 201111 IO
    A2 201112 IO
    A2 201201 IO
    A2 201202 IO
    A2 201203 IO
    A2 201204 IO
    A2 201205 IO
    A2 201206 IO
    A2 201207 IO
    A2 201208 IO
    A2 201209 IO
    A2 201210 IO
    A2 201211 IO
    A2 201212 IO
    A2 201301 IO
    A2 201302 IO 
    A2 201303 IO
    A2 201304 IO 
    A2 201305 IO
    A2 201306 IO 
    A2 201307 IO
    A2 201308 IO
    A2 201309 IO

    A2 201311 IO
    A2 201312 IO
    A2 201401 IO
    A2 201402 IO
    A2 201403 IO
    A2 201404 IO
    A2 201405 IO
    A2 201406 IO
    A2 201407 IO
    A2 201408 IO
    A2 201409 IO
    A2 201410 IO
    A2 201411 IO
    A2 201412 IO
    A2 201501 IO
    A2 201502 IO
    A2 201503 IO
    A2 201504 IO
    A2 201505 IO
    A2 201506 IO
    A2 201507 IO
    A2 201508 IO
    A2 201509 PIF
    A2 201510 PIF
    A2 201511 PIF
    A2 201512 PIF
    A2 201601 PIF
    A2 201602 PIF
    A2 201603 PIF
    ;
run;

data want (keep=Account_ID IO_GT_60_Mths_Ind);
    set have;
    by Account_ID;
    if first.Account_ID then count = 1;
    if intck('month',lag(month_key), month_key) = 1 then do;
        count + 1;
        if last.Account_ID then do;
            if count > 60 then do;
                IO_GT_60_Mths_Ind = 1;
                output;
            end;
            else do;
                IO_GT_60_Mths_Ind = 0;
                output;
            end;
            count = 1;
        end;
    end;
    else do;
        if count > 60 then do;
            IO_GT_60_Mths_Ind = 1;
            output;
        end;
        count = 1;
    end;
    where Payment_Type in ("IO","IOA");
run;

 

anubhav85verma
Fluorite | Level 6

It worked on few samples.

 

However, when I tried running this across whole population around 150000 accounts, I am not getting any account with IO_GT_60_Mths_Ind = 0.

In the output, I get around 4000 accounts with Ind = 0 which means none of these 4000 accounts have IO > 60 consecutive months.

 

From the total 150000 accounts, I am expecting around 20000 accounts with Ind = 1. That based on my data understanding is the right number.

 

Would you be able to suggest something else or perhaps a reason behind why its not working?

Thanks for your help mate.

Patrick
Opal | Level 21

I've reformulate the code a bit that @mklangley posted - but that's just because this suits my way of thinking better.

If none of your accounts matches the criteria and you believe it should then one thing to investigate:

What's the data type of your column month_key? And if it numeric: Does it contain a SAS Date value or just an integer for the month_key value?

 

Below sample code assumes that month_key contains a SAS Date value:


data want (keep=Account_ID IO_GE_60_Mths_Ind);
  set have;
  by Account_ID;
  lag_month_key=lag(month_key);

  if count>=60 then; /* do nothing, we're already happy */
  else if Payment_Type in ("IO","IOA") then
    do;
      if first.account_id then count=1;
      else if intck('month',lag_month_key, month_key) = 1 then count+1;
      else count=0;
    end;
  else count=0;

  if last.Account_ID then
    do;
      IO_GE_60_Mths_Ind= (count>=60);
      output;
      count=0;
    end;
run;

If it's just an integer with digits for yyyymm then below code should work.


data want (keep=Account_ID IO_GE_60_Mths_Ind);
  set have;
  by Account_ID;
  lag_month_key_dt=input(put(lag(month_key),6.),yymmn6.);
  month_key_dt    =input(put(month_key,6.),yymmn6.);

  if count>=60 then; /* do nothing, we're already happy */
  else if Payment_Type in ("IO","IOA") then
    do;
      if first.account_id then count=1;
      else if intck('month',lag_month_key_dt, month_key_dt) = 1 then count+1;
      else count=0;
    end;
  else count=0;

  if last.Account_ID then
    do;
      IO_GE_60_Mths_Ind= (count>=60);
      output;
      count=0;
    end;
run;
mklangley
Lapis Lazuli | Level 10

@Patrick Thank you. Yes, my code may need some refinement. But your reformulation only outputs on the last Account_ID. What if there are 60+ consecutive IO months, then a month gap, then five more consecutive IO months, all for the same Account_ID? I believe your code would only output based on the last five consecutive months (not the 60+ consecutive IO months earlier on).

anubhav85verma
Fluorite | Level 6

Guys, I truly appreciate both of your help in getting me the code for this.

Actually, we just need to check if the Account hits 61 months in IO consecutively and once it does that, we should not be bothered about what happens after that. After 61 consecutive months in IO, same account can have PIF for 5 months, PI for 3 months and then again IO, any pattern really. But I am not bothered by any of this, if the account has already hit 61 consecutive months in IO. Moment this happens, game over and we should flag this to 1. The modified code was working and even your code was also working but in the output, it was showing me all the rows, not just one row per account. I might have done some mistake there.



I then tried to code this on my own from the first principles. What we need is a counter that does the following.

1. Counter set to 1 for the first record for each account.

2. If difference between current month key and previous month key is greater than 1, then counter is again set to 1, OR

3. If current month's payment type is NOT IN ("IO","IOA") the counter is reset to 1, OR

4. If current month's payment type is IN ("IO","IOA") and previous month's payment type is not in ("IO","IOA") then counter is set to 1. This is because if it goes to IO the first time, I need that to be reflected as counter = 1, not counter = 2.

5. If any of the above 4 conditions are not satisfied, then we increment the counter by 1



Immediately output Indicator =1 as soon as counter = 61 and that's it, happy days!!



Below is the code that I used. The number of accounts where Indicator = 1 was matching with the code you guys shared with me, hence I got confidence the code that I wrote below is also doing the same thing.



data want;

set have;

by account_id month_key;





lag_month = lag(month_key);

lag_payment_type = lag(payment_type);



if(first.account_id) then call missing(lag_month);

if( first.account_id

OR intck('month', lag_month, month_key) > 1

OR payment_type NOT IN('IO', 'IOA')

OR (payment_type IN('IO', 'IOA') and lag_payment_type NOT IN ('IO','IOA'))



)



then n = 0;

n+1;

IO_GT_60_Months_Ind = (n GT 60);

format lag_month date9.;

run;
Patrick
Opal | Level 21

@mklangley wrote:

@Patrick Thank you. Yes, my code may need some refinement. But your reformulation only outputs on the last Account_ID. What if there are 60+ consecutive IO months, then a month gap, then five more consecutive IO months, all for the same Account_ID? I believe your code would only output based on the last five consecutive months (not the 60+ consecutive IO months earlier on).


No, it wouldn't. That's what below lines are for:

  if count>=60 then; /* do nothing, we're already happy */
  else
mklangley
Lapis Lazuli | Level 10

@anubhav85verma  You mentioned the code "worked on few samples." But you also mentioned that it didn't work for any of your actual data. Are any of your "samples" (that worked) from your actual data? If so, there must be some difference between your test and actual data. Look at it closely.

 - Is your data sorted by Account_ID and Month_Type?

 - Like @Patrick mentioned, is Month_Type a date variable?

 - Can you post an actual sample of your data?

SAS Innovate 2025: Register Now

Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!

Mastering the WHERE Clause in PROC SQL

SAS' Charu Shankar shares her PROC SQL expertise by showing you how to master the WHERE clause using real winter weather data.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 7 replies
  • 1262 views
  • 2 likes
  • 3 in conversation