Hi SAS Community,
Thank every one of you for all supports.
I am asking a help again.
In this data set, I have 8 distinct accounts in several banks (colored for clarity).
data have;
informat current_date date9.;
input Bank_number Account_number $ 5-7 Current_date Arrears_Band $22-28;
format current_date date9.;
cards;
10 111 30SEP2010 NPNA
10 111 31OCT2010 Current
10 111 30NOV2010 NPNA
10 111 30JUN2011 NPNA
10 111 01JAN2012 writoff
70 111 31DEC2011 NPNA
70 111 31JAN2012 NPNA
40 333 30NOV2010 Current
40 333 31DEC2010 NPNA
50 333 31JAN2011 NPNA
50 333 28FEB2011 writoff
20 333 31MAR2011 NPNA
20 333 30APR2011 NPNA
20 333 31MAY2011 NPNA
20 333 31AUG2011 NPNA
100 111 30APR2011 90 +
100 111 31MAY2011 90 +
100 111 31JUL2011 NPNA
100 111 31AUG2011 NPNA
100 111 04JAN2012 NPNA
80 111 31DEC2011 NPNA
40 555 08FEB2010 30 - 60
40 555 31MAR2010 Current
40 555 30APR2010 60 - 90
40 555 31MAY2010 Current
40 666 03JAN2012 1 - 30
;
run;
Question 1:
First I need to identify what are the accounts whose arrears_band = ‘NPNA’ throughout their life.
/*Answer to question 1*/
Lets call this table as Table_for_Q1
70 111 31DEC2011 NPNA
70 111 31JAN2012 NPNA
20 333 31MAR2011 NPNA
20 333 30APR2011 NPNA
20 333 31MAY2011 NPNA
20 333 31AUG2011 NPNA
80 111 31DEC2011 NPNA
Question 2:
Then I need to get a table having removed all of those account_numbers identified under Question 1 from original "have" table.
/*Answer to question 2*/
Lets call this table as Table_for_Q2
10 111 30SEP2010 NPNA
10 111 31OCT2010 Current
10 111 30NOV2010 NPNA
10 111 30JUN2011 NPNA
10 111 01JAN2012 writoff
40 333 30NOV2010 Current
40 333 31DEC2010 NPNA
50 333 31JAN2011 NPNA
50 333 28FEB2011 writoff
100 111 30APR2011 90 +
100 111 31MAY2011 90 +
100 111 31JUL2011 NPNA
100 111 31AUG2011 NPNA
100 111 04JAN2012 NPNA
40 555 08FEB2010 30 - 60
40 555 31MAR2010 Current
40 555 30APR2010 60 - 90
40 555 31MAY2010 Current
40 666 03JAN2012 1 - 30
What I have tried
/*Sorting by 3 variables that makes a unique record*/
proc sort data=have out=have_sorted;
by bank_number account_number current_date;
run;
/*flagging*/
data want;
set have_sorted;
if arrears_band ='NPNA' then flag=1;
else flag=0;
run;
Actually I cannot imagine where to go from here……
Just I only know I can do the following merging step to get the answer to question 2 which is Table_for_Q2, if I can find Table_for_Q1.
/*Merging*/
Proc sort data = have;
by bank_number account_number current_date;
run;
Proc sort data = Table_for_Q1;
by bank_number account_number current_date;
run;
Data Table_for_Q2;
Merge have (in=a)
Table_for_Q1 (in=b)
;
By bank_number account_number current_date;
If a and not Table_for_Q1
;
Run;
I would really appreciate if anyone of you could help me.
Thanks
Mirisage
Seems to me it can be done using straightforward SQL:
data have;
informat current_date date9.;
input Bank_number Account_number $ 5-7 Current_date Arrears_Band $22-28;
format current_date date9.;
cards;
10 111 30SEP2010 NPNA
10 111 31OCT2010 Current
10 111 30NOV2010 NPNA
10 111 30JUN2011 NPNA
10 111 01JAN2012 writoff
70 111 31DEC2011 NPNA
70 111 31JAN2012 NPNA
40 333 30NOV2010 Current
40 333 31DEC2010 NPNA
50 333 31JAN2011 NPNA
50 333 28FEB2011 writoff
20 333 31MAR2011 NPNA
20 333 30APR2011 NPNA
20 333 31MAY2011 NPNA
20 333 31AUG2011 NPNA
100 111 30APR2011 90 +
100 111 31MAY2011 90 +
100 111 31JUL2011 NPNA
100 111 31AUG2011 NPNA
100 111 04JAN2012 NPNA
80 111 31DEC2011 NPNA
40 555 08FEB2010 30 - 60
40 555 31MAR2010 Current
40 555 30APR2010 60 - 90
40 555 31MAY2010 Current
40 666 03JAN2012 1 - 30
;
/*Q1*/
proc sql;
select * from have group by Bank_number having sum(Arrears_Band ne 'NPNA')=0;quit;
/*Q2*/
proc sql;
select * from have
except
(select * from have group by Bank_number having sum(Arrears_Band ne 'NPNA')=0)
order by Bank_number;
quit;
Haikuo
Seems to me it can be done using straightforward SQL:
data have;
informat current_date date9.;
input Bank_number Account_number $ 5-7 Current_date Arrears_Band $22-28;
format current_date date9.;
cards;
10 111 30SEP2010 NPNA
10 111 31OCT2010 Current
10 111 30NOV2010 NPNA
10 111 30JUN2011 NPNA
10 111 01JAN2012 writoff
70 111 31DEC2011 NPNA
70 111 31JAN2012 NPNA
40 333 30NOV2010 Current
40 333 31DEC2010 NPNA
50 333 31JAN2011 NPNA
50 333 28FEB2011 writoff
20 333 31MAR2011 NPNA
20 333 30APR2011 NPNA
20 333 31MAY2011 NPNA
20 333 31AUG2011 NPNA
100 111 30APR2011 90 +
100 111 31MAY2011 90 +
100 111 31JUL2011 NPNA
100 111 31AUG2011 NPNA
100 111 04JAN2012 NPNA
80 111 31DEC2011 NPNA
40 555 08FEB2010 30 - 60
40 555 31MAR2010 Current
40 555 30APR2010 60 - 90
40 555 31MAY2010 Current
40 666 03JAN2012 1 - 30
;
/*Q1*/
proc sql;
select * from have group by Bank_number having sum(Arrears_Band ne 'NPNA')=0;quit;
/*Q2*/
proc sql;
select * from have
except
(select * from have group by Bank_number having sum(Arrears_Band ne 'NPNA')=0)
order by Bank_number;
quit;
Haikuo
Haikuo,
How are you doing? I like your code. I added a group variable "account_number" to your code.
/*Q1*/
proc sql;
create table q1 as
select * from have group by Bank_number,account_number
having sum(Arrears_Band ne 'NPNA')=0;quit;
/*Q2*/
proc sql;
create table q2 as
select * from have
except
(select * from have group by Bank_number,account_number having sum(Arrears_Band ne 'NPNA')=0)
order by Bank_number;
quit;
Hi Haikuo and Linlin,
I have applied the two codes to the follwing test data set named 'one' but it is strange that follwoing 3 accoutns are not identified as continuing NPNA accounts.
/*
Is there any reason?
10 28Feb2010 2222 9476 Personal OD NPNA
10 31Mar2010 2222 9635 Personal OD NPNA
10 30Apr2010 2222 9791 Personal OD NPNA*/
data one;
informat Current_date date9.;
input Bank_number Current_date Account_number $ 14 -17 Balance Product $ 24 -36 Arrears_Band $ 38-44;
format Current_date date9.;
cards;
10 28Feb2010 1111 2216 Personal Loan NPNA
10 31Mar2010 1111 2216 Personal Loan NPNA
10 30Apr2010 1111 2216 Personal Loan NPNA
10 31May2010 1111 2216 Personal Loan NPNA
10 30Jun2010 1111 2216 Personal Loan NPNA
10 06Jul2010 1111 2216 Personal Loan writoff
10 28Feb2010 2222 9476 Personal OD NPNA
10 31Mar2010 2222 9635 Personal OD NPNA
10 30Apr2010 2222 9791 Personal OD NPNA
30 28Feb2010 1111 3420 Res. Mortgage NPNA
30 31Mar2010 1111 3308 Res. Mortgage NPNA
30 30Jun2011 1111 2614 Res. Mortgage NPNA
30 31Jul2011 1111 2599 Res. Mortgage 1 - 30
30 31Aug2011 1111 2599 Res. Mortgage Current
30 30Sep2011 1111 2599 Res. Mortgage Current
30 31Oct2011 1111 2599 Res. Mortgage 1 - 30
30 30Nov2011 1111 2591 Res. Mortgage Current
30 31Dec2011 1111 2584 Res. Mortgage Current
30 31Jan2012 1111 2584 Res. Mortgage Current
40 28Feb2010 1111 1492 Personal Loan NPNA
40 31Mar2010 1111 1487 Personal Loan NPNA
40 31Aug2010 1111 1425 Personal Loan NPNA
40 30Sep2010 1111 7226 Personal Loan writoff
40 28Feb2011 1111 3510 Personal Loan NPNA
40 31Mar2011 1111 3510 Personal Loan NPNA
40 30Apr2011 1111 3510 Personal Loan NPNA
40 31May2011 1111 3510 Personal Loan NPNA
;
run;
/*identification of continuing NPNA accounts*/
proc sql;
create table q1 as
select * from one group by Bank_number,account_number
having sum(Arrears_Band ne 'NPNA')=0;quit;
/*Removing of continuing NPNA accounts*/
proc sql;
create table temp_1 as
select * from one
except
(select * from one group by Bank_number,account_number having sum(Arrears_Band ne 'NPNA')=0)
order by Bank_number;
quit;
Thanks
Mirisage
Hi Mirisage,
I ran the code. Below are the output and log file:
Current_ Bank_ Account_ Arrears_
Obs date number number Balance Product Band
1 28FEB2010 10 2222 9476 Personal OD NPNA
2 30APR2010 10 2222 9791 Personal OD NPNA
3 31MAR2010 10 2222 9635 Personal OD NPNA
1 data one;
2 informat Current_date date9.;
3 input Bank_number Current_date Account_number $ 14 -17 Balance
3 ! Product $ 24 -36 Arrears_Band $ 38-44;
4 format Current_date date9.;
5 cards;
NOTE: The data set WORK.ONE has 27 observations and 6 variables.
NOTE: DATA statement used (Total process time):
real time 0.02 seconds
cpu time 0.01 seconds
33 ;
34 run;
35
36 /*identification of continuing NPNA accounts*/
37 proc sql;
38 create table q1 as
39 select * from one group by Bank_number,account_number
40 having sum(Arrears_Band ne 'NPNA')=0;
NOTE: The query requires remerging summary statistics back with the original data.
NOTE: Table WORK.Q1 created, with 3 rows and 6 columns.
40 ! quit;
NOTE: PROCEDURE SQL used (Total process time):
real time 0.04 seconds
cpu time 0.04 seconds
41
42 proc print data=q1;run;
NOTE: There were 3 observations read from the data set WORK.Q1.
NOTE: PROCEDURE PRINT used (Total process time):
real time 0.19 seconds
cpu time 0.04 seconds
Hi Linlin and Haikuo,
My bad. I apologize. Just after you (linlin) posted your results of running the code on my new data set, I have tried it. It worked like anything.
I really do not know what happened to me in my previous occasion.
Many thanks to both of you for this great code help.
Regards
Mirisage
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.
Find more tutorials on the SAS Users YouTube channel.