Hi Guys,
I have a typical banking data and need some help.
There are 3 columns: Account ID, Month Key(yyyymm format) and Payment Type. Payment Type can take values IO,IOA,PIF,PI,P,NFD,Null.
I have around 250,000 accounts and objective is to find such accounts that have Payment_Type in ("IO","IOA") for consecutive 60+ months. Discontinuous 60 months in IO is not my objective.
Account ID | Month Key | Payment Type |
A1 | 201001 | |
A1 | 201002 | IO |
A1 | 201003 | PIF |
A1 | 201004 | PI |
A1 | 201005 | P |
A1 | 201006 | |
A1 | 201007 | IOA |
A1 | 201008 | IO |
A1 | 201009 | IOA |
A1 | 201010 | IOA |
A1 | 201011 | IO |
A1 | 201012 | IO |
A1 | 201101 | IO |
A1 | 201102 | IO |
A1 | 201103 | IO |
A1 | 201104 | IO |
A1 | 201105 | IO |
A1 | 201106 | IO |
A1 | 201107 | IO |
A1 | 201108 | IO |
A1 | 201109 | IO |
A1 | 201110 | IO |
A1 | 201111 | IO |
A1 | 201112 | IO |
A1 | 201201 | IO |
A1 | 201202 | IO |
A1 | 201203 | IO |
A1 | 201204 | IO |
A1 | 201205 | IO |
A1 | 201206 | IO |
A1 | 201207 | IO |
A1 | 201208 | IO |
A1 | 201209 | IO |
A1 | 201210 | IO |
A1 | 201211 | IO |
A1 | 201212 | IO |
A1 | 201301 | IO |
A1 | 201302 | IO |
A1 | 201303 | IO |
A1 | 201304 | IO |
A1 | 201305 | IO |
A1 | 201306 | IO |
A1 | 201307 | IO |
A1 | 201308 | IO |
A1 | 201309 | IO |
A1 | 201310 | IO |
A1 | 201311 | IO |
A1 | 201312 | IO |
A1 | 201401 | IO |
A1 | 201402 | IO |
A1 | 201403 | IO |
A1 | 201404 | IO |
A1 | 201405 | IO |
A1 | 201406 | IO |
A1 | 201407 | IO |
A1 | 201408 | IO |
A1 | 201409 | IO |
A1 | 201410 | IO |
A1 | 201411 | IO |
A1 | 201412 | IO |
A1 | 201501 | IO |
A1 | 201502 | IO |
A1 | 201503 | IO |
A1 | 201504 | IO |
A1 | 201505 | IO |
A1 | 201506 | IO |
A1 | 201507 | IO |
A1 | 201508 | IO |
A1 | 201509 | PIF |
A1 | 201510 | PIF |
A1 | 201511 | PIF |
A1 | 201512 | PIF |
A1 | 201601 | PIF |
A1 | 201602 | PIF |
A1 | 201603 | PIF |
This account is in IO for a period of 62 consecutive months starting from 201007 ending at 201508.
My final output should have Account ID, and an indicator stating whether account is in IO > 60+ Months. Better to create an indicator with value 1 such as below if 60+ months in IO, else 0.
Account_ID IO_GT_60_Mths_Ind
A1 1
Can someone please help me. Appreciate!!
@Patrick Thank you. Yes, my code may need some refinement. But your reformulation only outputs on the last Account_ID. What if there are 60+ consecutive IO months, then a month gap, then five more consecutive IO months, all for the same Account_ID? I believe your code would only output based on the last five consecutive months (not the 60+ consecutive IO months earlier on).
Give this a try. I am assuming your data is sorted by Account_ID and Month_Type.
Note that I duplicated the A1 data, relabeled the Account_ID to A2, then removed a month in the middle (note the deliberate line break). That shows an ID for which there are not 60+ consecutive months with Payment_Type of IO or IOA.
data have;
input Account_ID $ Month_Key :yymmn6. Payment_Type $3.;
format month_key yymmn6.;
datalines;
A1 201001
A1 201002 IO
A1 201003 PIF
A1 201004 PI
A1 201005 P
A1 201006
A1 201007 IOA
A1 201008 IO
A1 201009 IOA
A1 201010 IOA
A1 201011 IO
A1 201012 IO
A1 201101 IO
A1 201102 IO
A1 201103 IO
A1 201104 IO
A1 201105 IO
A1 201106 IO
A1 201107 IO
A1 201108 IO
A1 201109 IO
A1 201110 IO
A1 201111 IO
A1 201112 IO
A1 201201 IO
A1 201202 IO
A1 201203 IO
A1 201204 IO
A1 201205 IO
A1 201206 IO
A1 201207 IO
A1 201208 IO
A1 201209 IO
A1 201210 IO
A1 201211 IO
A1 201212 IO
A1 201301 IO
A1 201302 IO
A1 201303 IO
A1 201304 IO
A1 201305 IO
A1 201306 IO
A1 201307 IO
A1 201308 IO
A1 201309 IO
A1 201310 IO
A1 201311 IO
A1 201312 IO
A1 201401 IO
A1 201402 IO
A1 201403 IO
A1 201404 IO
A1 201405 IO
A1 201406 IO
A1 201407 IO
A1 201408 IO
A1 201409 IO
A1 201410 IO
A1 201411 IO
A1 201412 IO
A1 201501 IO
A1 201502 IO
A1 201503 IO
A1 201504 IO
A1 201505 IO
A1 201506 IO
A1 201507 IO
A1 201508 IO
A1 201509 PIF
A1 201510 PIF
A1 201511 PIF
A1 201512 PIF
A1 201601 PIF
A1 201602 PIF
A1 201603 PIF
A2 201007 IOA
A2 201008 IO
A2 201009 IOA
A2 201010 IOA
A2 201011 IO
A2 201012 IO
A2 201101 IO
A2 201102 IO
A2 201103 IO
A2 201104 IO
A2 201105 IO
A2 201106 IO
A2 201107 IO
A2 201108 IO
A2 201109 IO
A2 201110 IO
A2 201111 IO
A2 201112 IO
A2 201201 IO
A2 201202 IO
A2 201203 IO
A2 201204 IO
A2 201205 IO
A2 201206 IO
A2 201207 IO
A2 201208 IO
A2 201209 IO
A2 201210 IO
A2 201211 IO
A2 201212 IO
A2 201301 IO
A2 201302 IO
A2 201303 IO
A2 201304 IO
A2 201305 IO
A2 201306 IO
A2 201307 IO
A2 201308 IO
A2 201309 IO
A2 201311 IO
A2 201312 IO
A2 201401 IO
A2 201402 IO
A2 201403 IO
A2 201404 IO
A2 201405 IO
A2 201406 IO
A2 201407 IO
A2 201408 IO
A2 201409 IO
A2 201410 IO
A2 201411 IO
A2 201412 IO
A2 201501 IO
A2 201502 IO
A2 201503 IO
A2 201504 IO
A2 201505 IO
A2 201506 IO
A2 201507 IO
A2 201508 IO
A2 201509 PIF
A2 201510 PIF
A2 201511 PIF
A2 201512 PIF
A2 201601 PIF
A2 201602 PIF
A2 201603 PIF
;
run;
data want (keep=Account_ID IO_GT_60_Mths_Ind);
set have;
by Account_ID;
if first.Account_ID then count = 1;
if intck('month',lag(month_key), month_key) = 1 then do;
count + 1;
if last.Account_ID then do;
if count > 60 then do;
IO_GT_60_Mths_Ind = 1;
output;
end;
else do;
IO_GT_60_Mths_Ind = 0;
output;
end;
count = 1;
end;
end;
else do;
if count > 60 then do;
IO_GT_60_Mths_Ind = 1;
output;
end;
count = 1;
end;
where Payment_Type in ("IO","IOA");
run;
It worked on few samples.
However, when I tried running this across whole population around 150000 accounts, I am not getting any account with IO_GT_60_Mths_Ind = 0.
In the output, I get around 4000 accounts with Ind = 0 which means none of these 4000 accounts have IO > 60 consecutive months.
From the total 150000 accounts, I am expecting around 20000 accounts with Ind = 1. That based on my data understanding is the right number.
Would you be able to suggest something else or perhaps a reason behind why its not working?
Thanks for your help mate.
I've reformulate the code a bit that @mklangley posted - but that's just because this suits my way of thinking better.
If none of your accounts matches the criteria and you believe it should then one thing to investigate:
What's the data type of your column month_key? And if it numeric: Does it contain a SAS Date value or just an integer for the month_key value?
Below sample code assumes that month_key contains a SAS Date value:
data want (keep=Account_ID IO_GE_60_Mths_Ind);
set have;
by Account_ID;
lag_month_key=lag(month_key);
if count>=60 then; /* do nothing, we're already happy */
else if Payment_Type in ("IO","IOA") then
do;
if first.account_id then count=1;
else if intck('month',lag_month_key, month_key) = 1 then count+1;
else count=0;
end;
else count=0;
if last.Account_ID then
do;
IO_GE_60_Mths_Ind= (count>=60);
output;
count=0;
end;
run;
If it's just an integer with digits for yyyymm then below code should work.
data want (keep=Account_ID IO_GE_60_Mths_Ind);
set have;
by Account_ID;
lag_month_key_dt=input(put(lag(month_key),6.),yymmn6.);
month_key_dt =input(put(month_key,6.),yymmn6.);
if count>=60 then; /* do nothing, we're already happy */
else if Payment_Type in ("IO","IOA") then
do;
if first.account_id then count=1;
else if intck('month',lag_month_key_dt, month_key_dt) = 1 then count+1;
else count=0;
end;
else count=0;
if last.Account_ID then
do;
IO_GE_60_Mths_Ind= (count>=60);
output;
count=0;
end;
run;
@Patrick Thank you. Yes, my code may need some refinement. But your reformulation only outputs on the last Account_ID. What if there are 60+ consecutive IO months, then a month gap, then five more consecutive IO months, all for the same Account_ID? I believe your code would only output based on the last five consecutive months (not the 60+ consecutive IO months earlier on).
@mklangley wrote:
@Patrick Thank you. Yes, my code may need some refinement. But your reformulation only outputs on the last Account_ID. What if there are 60+ consecutive IO months, then a month gap, then five more consecutive IO months, all for the same Account_ID? I believe your code would only output based on the last five consecutive months (not the 60+ consecutive IO months earlier on).
No, it wouldn't. That's what below lines are for:
if count>=60 then; /* do nothing, we're already happy */ else
@anubhav85verma You mentioned the code "worked on few samples." But you also mentioned that it didn't work for any of your actual data. Are any of your "samples" (that worked) from your actual data? If so, there must be some difference between your test and actual data. Look at it closely.
- Is your data sorted by Account_ID and Month_Type?
- Like @Patrick mentioned, is Month_Type a date variable?
- Can you post an actual sample of your data?
Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!
SAS' Charu Shankar shares her PROC SQL expertise by showing you how to master the WHERE clause using real winter weather data.
Find more tutorials on the SAS Users YouTube channel.