Hi Colleagues,
Attached longitudinal data set shows how 5 bank accounts are swinging from
one delinquency bucket to the other (original data set has over million
records).
SQL code provided at the very end of this returns the sum of variable 'balance' for each arrears band
grouped by Current_date subject to this condition.
only those accounts that have fallen into NPNA status after
28FEB2010. NPNA balances in 28FEB2010 and in all subsequent months of
those accounts that were already in NPNA status
in 28FEB2010 should be omitted from summing up.
E.g. Account 1111111111's
NPNA balance should be excluded from summing up
in 28FEB2010 and in all subsequent months because it was already in NPNA
status when we begin our modeling period in 28FEB2010.
QUESTION:
I need to incorporate another condition with regard to NPNA arrers bands.
E.g. A/C 4444444444 has fallen into delq. bucket permanenelty in
28FEB2011. So, its NPNA balances in all subsequent months (in 31MAR2011
and beyond) should be omitted from summing up. Note that A/C 4444444444 has fallen
into NPNA in 31DEC2010 too, but it was not permanenet, so 31DEC2010
NPNA balance should not be excluded.
Likewise, in A/C 5555555555 NPNA balances in 31MAR2011 and beyond
should be omitted from summing up.
Final answer should be like this.
Current_date current 1-30 30-60 60-90 90+ NPNA
28FEB2010 24 0 0 0 0 .
31MAR2010 25 23 0 0 0 .
30APR2010 25 0 23 0 0 .
31MAY2010 25 0 0 23 0 .
30JUN2010 0 0 0 0 23 .
31JUL2010 0 0 0 0 0 85
30NOV2010 25 0 0 0 0 .
31DEC2010 0 0 0 0 0 25
31JAN2011 25 18 0 0 0 .
28FEB2011 0 0 0 0 0 43
31MAR2011 0 0 0 0 0 .
30APR2011 0 0 0 0 0 .
31MAY2011 85 0 0 0 0 .
Could anyone help me to modify this SQL code?
Thanks, Mirisage
proc sql;
create table LOSS1 as
select a1.*,a2.npna
from(
select
Current_date ,
sum(case when Arrears_Band IN ('Current') then balance else 0 end ) as current,
sum(case when Arrears_Band IN ('1 - 30') then balance else 0 end ) as One_to_30,
sum(case when Arrears_Band IN ('30 - 60') then balance else 0 end ) as Thirty_to_60,
sum(case when Arrears_Band IN ('60 - 90') then balance else 0 end ) as sixty_to_90,
sum(case when Arrears_Band IN ('90 +') then balance else 0 end) as Ninety_plus
from b
where Product IN ('Personal OD','Personal Loan','Res. Mortgage') and
Bank_number = 10
group by Current_date, Bank_number
) a1
left join
(
select
Current_date ,
sum(balance)as npna
from b
where Product IN ('Personal OD','Personal Loan','Res. Mortgage') and
arrears_Band = "NPNA" and
Bank_number = 10 and Account_number not in (select Account_number from b
where
current_date =mdy(02,28,2010) and arrears_Band = "NPNA")
group by Current_date, Bank_number) a2
on
a1.Current_date = a2.Current_date;
quit;
Does it have to be SQL code? Using hash object along with a hash iterator in a data step can be an alternative way to address this issue.
Read records by Account_Number and check out whether Arrears_Band = 'NPNA' or not.
If so and the value of NPNA_Start_Month is missing then set it equal to Current_Date. Else, set NPNA_Start_Month value to missing.
Load the record into a hash object. After the last record for given account_number read traverse the records for it using hash iterator object.
Inıtıalize your sum varaiblles
Current = 0;
One_to_30 = 0;
Thirty_to_60 = 0;
Sixty_to_90 = 0;
Ninety_plus = 0;
NPNA = .;
Update the value of your sum variables based on the value of Arrears_Band and output the record. After all records read clear the hash object and start over for the next account_number.
After all records for all account_numbers have been read summarize your temp data set.
data b;
informat Current_date date9.;
Input Bank_number 1-2 Current_date Account_number $14-23 Balance 25-
26 product $28-40 Arrears_Band $42-48 ;
format Current_date date9.;
cards;
10 28FEB2010 1111111111 45 Personal Loan NPNA
10 31MAR2010 1111111111 45 Personal Loan NPNA
10 30APR2010 1111111111 45 Personal Loan NPNA
10 31MAY2010 1111111111 45 Personal Loan NPNA
10 31MAR2010 2222222222 25 Personal Loan Current
10 30APR2010 2222222222 25 Personal Loan Current
10 31MAY2010 2222222222 25 Personal Loan Current
10 28FEB2010 3333333333 24 Personal Loan Current
10 31MAR2010 3333333333 23 Personal Loan 1 - 30
10 30APR2010 3333333333 23 Personal Loan 30 - 60
10 31MAY2010 3333333333 23 Personal Loan 60 - 90
10 30JUN2010 3333333333 23 Personal Loan 90 +
10 31JUL2010 3333333333 85 Personal Loan NPNA
10 30NOV2010 4444444444 25 Personal Loan Current
10 31DEC2010 4444444444 25 Personal Loan NPNA
10 31JAN2011 4444444444 18 Personal Loan 1 - 30
10 28FEB2011 4444444444 18 Personal Loan NPNA
10 31MAR2011 4444444444 18 Personal Loan NPNA
10 30APR2011 4444444444 18 Personal Loan NPNA
10 31MAY2011 4444444444 85 Personal Loan Current
10 31JAN2011 5555555555 25 Personal Loan Current
10 28FEB2011 5555555555 25 Personal Loan NPNA
10 31MAR2011 5555555555 22 Personal Loan NPNA
10 30APR2011 5555555555 21 Personal Loan NPNA
;
run;
data temp;
if _n_ = 1 then do;
declare hash h(ordered:'a');
h.defineKey('Current_Date');
h.defineData('Current_Date','Bank_Number','Account_Number','Balance','Product','Arrears_Band','NPNA_Start_Month');
h.defineDone();
declare hiter hi('h');
end;
NPNA_Start_Month = .;
do until(last.Account_Number);
set b;
where Product IN ('Personal OD','Personal Loan','Res. Mortgage') and
Bank_number = 10;
by Account_Number;
if Arrears_Band = 'NPNA' and NPNA_Start_Month = . then NPNA_Start_Month = Current_Date;
else if Arrears_Band ne 'NPNA' then NPNA_Start_Month = .;
h.replace();
end;
rc = hi.first();
do while(rc=0);
Current = 0;
One_to_30 = 0;
Thirty_to_60 = 0;
Sixty_to_90 = 0;
Ninety_plus = 0;
NPNA = .;
if NPNA_Start_Month = . or (
NPNA_Start_Month = Current_Date and NPNA_Start_Month > '28Feb2010'd) then do;
select (Arrears_Band);
when ('Current') do;Current = Balance;end;
when ('1 - 30') do;One_to_30 = Balance;end;
when ('30 - 60') do;Thirty_to_60 = Balance;end;
when ('60 - 90') do;Sixty_to_90 = Balance;end;
when ('90 +') do;Ninety_plus = Balance;end;
when ('NPNA') do;NPNA = Balance;end;
end;
end;
output;
rc = hi.next();
end;
h.clear();
run;
proc summary data=temp nway missing;
class Current_Date;
var Current One_to_30 Thirty_to_60 Sixty_to_90 Ninety_plus NPNA;
output out=want(drop=_:)
sum=;
run;
Does it have to be SQL code? Using hash object along with a hash iterator in a data step can be an alternative way to address this issue.
Read records by Account_Number and check out whether Arrears_Band = 'NPNA' or not.
If so and the value of NPNA_Start_Month is missing then set it equal to Current_Date. Else, set NPNA_Start_Month value to missing.
Load the record into a hash object. After the last record for given account_number read traverse the records for it using hash iterator object.
Inıtıalize your sum varaiblles
Current = 0;
One_to_30 = 0;
Thirty_to_60 = 0;
Sixty_to_90 = 0;
Ninety_plus = 0;
NPNA = .;
Update the value of your sum variables based on the value of Arrears_Band and output the record. After all records read clear the hash object and start over for the next account_number.
After all records for all account_numbers have been read summarize your temp data set.
data b;
informat Current_date date9.;
Input Bank_number 1-2 Current_date Account_number $14-23 Balance 25-
26 product $28-40 Arrears_Band $42-48 ;
format Current_date date9.;
cards;
10 28FEB2010 1111111111 45 Personal Loan NPNA
10 31MAR2010 1111111111 45 Personal Loan NPNA
10 30APR2010 1111111111 45 Personal Loan NPNA
10 31MAY2010 1111111111 45 Personal Loan NPNA
10 31MAR2010 2222222222 25 Personal Loan Current
10 30APR2010 2222222222 25 Personal Loan Current
10 31MAY2010 2222222222 25 Personal Loan Current
10 28FEB2010 3333333333 24 Personal Loan Current
10 31MAR2010 3333333333 23 Personal Loan 1 - 30
10 30APR2010 3333333333 23 Personal Loan 30 - 60
10 31MAY2010 3333333333 23 Personal Loan 60 - 90
10 30JUN2010 3333333333 23 Personal Loan 90 +
10 31JUL2010 3333333333 85 Personal Loan NPNA
10 30NOV2010 4444444444 25 Personal Loan Current
10 31DEC2010 4444444444 25 Personal Loan NPNA
10 31JAN2011 4444444444 18 Personal Loan 1 - 30
10 28FEB2011 4444444444 18 Personal Loan NPNA
10 31MAR2011 4444444444 18 Personal Loan NPNA
10 30APR2011 4444444444 18 Personal Loan NPNA
10 31MAY2011 4444444444 85 Personal Loan Current
10 31JAN2011 5555555555 25 Personal Loan Current
10 28FEB2011 5555555555 25 Personal Loan NPNA
10 31MAR2011 5555555555 22 Personal Loan NPNA
10 30APR2011 5555555555 21 Personal Loan NPNA
;
run;
data temp;
if _n_ = 1 then do;
declare hash h(ordered:'a');
h.defineKey('Current_Date');
h.defineData('Current_Date','Bank_Number','Account_Number','Balance','Product','Arrears_Band','NPNA_Start_Month');
h.defineDone();
declare hiter hi('h');
end;
NPNA_Start_Month = .;
do until(last.Account_Number);
set b;
where Product IN ('Personal OD','Personal Loan','Res. Mortgage') and
Bank_number = 10;
by Account_Number;
if Arrears_Band = 'NPNA' and NPNA_Start_Month = . then NPNA_Start_Month = Current_Date;
else if Arrears_Band ne 'NPNA' then NPNA_Start_Month = .;
h.replace();
end;
rc = hi.first();
do while(rc=0);
Current = 0;
One_to_30 = 0;
Thirty_to_60 = 0;
Sixty_to_90 = 0;
Ninety_plus = 0;
NPNA = .;
if NPNA_Start_Month = . or (
NPNA_Start_Month = Current_Date and NPNA_Start_Month > '28Feb2010'd) then do;
select (Arrears_Band);
when ('Current') do;Current = Balance;end;
when ('1 - 30') do;One_to_30 = Balance;end;
when ('30 - 60') do;Thirty_to_60 = Balance;end;
when ('60 - 90') do;Sixty_to_90 = Balance;end;
when ('90 +') do;Ninety_plus = Balance;end;
when ('NPNA') do;NPNA = Balance;end;
end;
end;
output;
rc = hi.next();
end;
h.clear();
run;
proc summary data=temp nway missing;
class Current_Date;
var Current One_to_30 Thirty_to_60 Sixty_to_90 Ninety_plus NPNA;
output out=want(drop=_:)
sum=;
run;
Hi Alpay,
This is splendid! It worked very well and many thanks.
However, I have found several exceptional accounts in my data set which is not captured by the hash code. One such is reproduced below.
data b;
informat Current_date date9.;
Input Bank_number 1-2 Current_date Account_number $14-23
Balance 25-31 product $33-45 Arrears_Band $47-53 ;
format Current_date date9.;
cards;
10 28FEB2010 1111111111 6999.99 Personal Loan NPNA
10 31MAR2010 1111111111 6999.99 Personal Loan NPNA
10 30APR2010 1111111111 6999.99 Personal Loan NPNA
10 31MAY2010 1111111111 6999.99 Personal Loan NPNA
10 30JUN2010 1111111111 4444.44 Personal Loan Current
10 31JUL2010 1111111111 4444.44 Personal Loan 30 - 60
10 31AUG2010 1111111111 4444.44 Personal Loan 30 - 60
10 30SEP2010 1111111111 4444.44 Personal Loan 60 - 90
10 31OCT2010 1111111111 4444.44 Personal Loan NPNA
10 30NOV2010 1111111111 4444.44 Personal Loan NPNA
10 31DEC2010 1111111111 4444.44 Personal Loan NPNA
10 31JAN2011 1111111111 3333.21 Personal Loan NPNA
10 28FEB2011 1111111111 2222.22 Personal Loan Current
10 31MAR2011 1111111111 2222.22 Personal Loan 1 - 30
10 30APR2011 1111111111 2222.22 Personal Loan 30 - 60
10 31MAY2011 1111111111 2222.22 Personal Loan 1 - 30
10 30JUN2011 1111111111 2222.22 Personal Loan Current
10 31JUL2011 1111111111 2222.22 Personal Loan 30 - 60
10 31AUG2011 1111111111 2222.22 Personal Loan 60 - 90
10 30SEP2011 1111111111 7777.77 Personal Loan NPNA
10 31OCT2011 1111111111 7777.77 Personal Loan NPNA
10 30NOV2011 1111111111 7777.77 Personal Loan NPNA
10 31DEC2011 1111111111 7777.77 Personal Loan NPNA
;
run;
When I apply the hash object code for this account, the answer is like below. However, for 28FEB2010, NPNA
should not be . (i.e. should not be "period"), it should be 6999.99. All others are correct.
Current_date current 1-30 30-60 60-90 90+ NPNA
28FEB2010 0 0 0 0 0 .
31MAR2010 0 0 0 0 0 .
30APR2010 0 0 0 0 0 .
31MAY2010 0 0 0 0 0 .
30JUN2010 4444.44 0 0 0 0 .
31JUL2010 0 0 4444.44 0 0 .
31AUG2010 0 0 4444.44 0 0 .
30SEP2010 0 0 0 4444.44 0 .
31OCT2010 0 0 0 0 0 4444.44
30NOV2010 0 0 0 0 0 .
31DEC2010 0 0 0 0 0 .
31JAN2011 0 0 0 0 0 .
28FEB2011 2222.22 0 0 0 0 .
31MAR2011 0 2222.22 0 0 0 .
30APR2011 0 0 2222.22 0 0 .
31MAY2011 0 2222.22 0 0 0 .
30JUN2011 2222.22 0 0 0 0 .
31JUL2011 0 0 2222.22 0 0 .
31AUG2011 0 0 0 2222.22 0 .
30SEP2011 0 0 0 0 0 7777.77
31OCT2011 0 0 0 0 0 .
30NOV2011 0 0 0 0 0 .
31DEC2011 0 0 0 0 0 .
The hash code omitted the calculations for first four records because we have incorporated the following condition.
Condition 1:
"Balances corresponding to NPNA Arrears_Band have to be summed up for
only those accounts that have fallen into NPNA status after
28FEB2010. NPNA balances in 28FEB2010 and in all subsequent months of
those accounts that were already in NPNA status
in 28FEB2010 should be omitted from summing up".
It is true this special account has started with "NPNA" status from 28FEB2010 but it has not continued its NPNA
status throughout. So, these type of accounts should be excluded from the condition 1 above although
they have started with "NPNA" status from 28FEB2010.
Question: I wonder if there is a possiblity to incorporate this business reality into hash code.
Thanks for your time
Mirisage
Hi Minisage,
We can capture the last NPNA_Start_Month for a given account and check whether it is equal to 28Feb2010 or not instead of NPNA_Start_Month greater than 28Feb2010.
This way, we can capture accounts gone into NPNA status on 28Feb2010 and come out of NPNA status in subsequent months.
data b;
informat Current_date date9.;
Input Bank_number 1-2 Current_date Account_number $14-23
Balance 25-31 product $33-45 Arrears_Band $47-53 ;
format Current_date date9.;
cards;
10 28FEB2010 1111111111 6999.99 Personal Loan NPNA
10 31MAR2010 1111111111 6999.99 Personal Loan NPNA
10 30APR2010 1111111111 6999.99 Personal Loan NPNA
10 31MAY2010 1111111111 6999.99 Personal Loan NPNA
10 30JUN2010 1111111111 4444.44 Personal Loan Current
10 31JUL2010 1111111111 4444.44 Personal Loan 30 - 60
10 31AUG2010 1111111111 4444.44 Personal Loan 30 - 60
10 30SEP2010 1111111111 4444.44 Personal Loan 60 - 90
10 31OCT2010 1111111111 4444.44 Personal Loan NPNA
10 30NOV2010 1111111111 4444.44 Personal Loan NPNA
10 31DEC2010 1111111111 4444.44 Personal Loan NPNA
10 31JAN2011 1111111111 3333.21 Personal Loan NPNA
10 28FEB2011 1111111111 2222.22 Personal Loan Current
10 31MAR2011 1111111111 2222.22 Personal Loan 1 - 30
10 30APR2011 1111111111 2222.22 Personal Loan 30 - 60
10 31MAY2011 1111111111 2222.22 Personal Loan 1 - 30
10 30JUN2011 1111111111 2222.22 Personal Loan Current
10 31JUL2011 1111111111 2222.22 Personal Loan 30 - 60
10 31AUG2011 1111111111 2222.22 Personal Loan 60 - 90
10 30SEP2011 1111111111 7777.77 Personal Loan NPNA
10 31OCT2011 1111111111 7777.77 Personal Loan NPNA
10 30NOV2011 1111111111 7777.77 Personal Loan NPNA
10 31DEC2011 1111111111 7777.77 Personal Loan NPNA
;
run;
data temp;
if _n_ = 1 then do;
declare hash h(ordered:'a');
h.defineKey('Current_Date');
h.defineData('Current_Date','Bank_Number','Account_Number','Balance','Product','Arrears_Band','NPNA_Start_Month');
h.defineDone();
declare hiter hi('h');
end;
NPNA_Start_Month = .;
do until(last.Account_Number);
set b;
where Product IN ('Personal OD','Personal Loan','Res. Mortgage') and
Bank_number = 10;
by Account_Number;
if Arrears_Band = 'NPNA' and NPNA_Start_Month = . then NPNA_Start_Month = Current_Date;
else if Arrears_Band ne 'NPNA' then NPNA_Start_Month = .;
h.replace();
end;
Last_NPNA_Start_Month = NPNA_Start_Month; /* Added 6/18/2012 */
rc = hi.first();
do while(rc=0);
Current = 0;
One_to_30 = 0;
Thirty_to_60 = 0;
Sixty_to_90 = 0;
Ninety_plus = 0;
NPNA = .;
if NPNA_Start_Month = . or (
NPNA_Start_Month = Current_Date and /* NPNA_Start_Month > '28Feb2010'd */
Last_NPNA_Start_Month ne '28Feb2010'd) then do;
select (Arrears_Band);
when ('Current') do;Current = Balance;end;
when ('1 - 30') do;One_to_30 = Balance;end;
when ('30 - 60') do;Thirty_to_60 = Balance;end;
when ('60 - 90') do;Sixty_to_90 = Balance;end;
when ('90 +') do;Ninety_plus = Balance;end;
when ('NPNA') do;NPNA = Balance;end;
end;
end;
output;
rc = hi.next();
end;
h.clear();
run;
proc summary data=temp nway missing;
class Current_Date;
var Current One_to_30 Thirty_to_60 Sixty_to_90 Ninety_plus NPNA;
output out=want(drop=_:)
sum=;
run;
Hi Mirisage,
I realized I mixed up your name in previous reply. Sorry about that.
Hi Alpey,
Your revised code is great!
It worked very well. Many thanks indeed.
I should try to digest hash object approach which I have never been exposed before.
As it will take sometime, I might ask your help to incorporate 1 or 2 other pressing business needs around the same problem as I am moving forward, in next few days, and please help me in them too.
Thanks again for your help!
Mirisage
Save $250 on SAS Innovate and get a free advance copy of the new SAS For Dummies book! Use the code "SASforDummies" to register. Don't miss out, May 6-9, in Orlando, Florida.
Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.