Hi Forum;
I have 9 distinct accounts in this dataset (this data set can be readily read in SAS).
data this;
informat current_date date9.;
input Bank_number Account_number $ 5-7 Current_date Arrears_Band $19-25;
format current_date date9.;
cards;
10 111 30SEP2010 NPNA
10 111 31OCT2010 Current
10 111 30NOV2010 NPNA
10 111 30JUN2011 NPNA
10 111 01JAN2012 writoff
70 222 31DEC2011 writoff
70 222 31JAN2012 NPNA
40 333 30NOV2010 Current
40 333 31DEC2010 NPNA
50 333 31JAN2011 NPNA
50 333 28FEB2011 writoff
20 333 31MAR2011 NPNA
20 333 30APR2011 NPNA
20 333 31MAY2011 NPNA
20 333 30JUN2011 NPNA
100 111 30APR2011 90 +
100 111 31MAY2011 90 +
100 111 31JUL2011 NPNA
100 111 31AUG2011 NPNA
100 111 04JAN2012 NPNA
40 555 08FEB2010 30 - 60
40 555 31MAR2010 NPNA
40 555 30APR2010 60 - 90
40 555 31MAY2010 NPNA
40 666 03JAN2012 NPNA
10 777 04FEB2010 1 - 30
10 333 03MAR2010 Current
;
run;
I have a Business Rule:
Note:
1). If a subject has been exclusively in NPNA arrears_band throughout its life,
then that subject does not violate our condition.
2). if a subject has only a single record and its Arrears_Band = “NPNA”,
then that subject does not violate our condition.
Answer:
10 111 30SEP2010 NPNA
10 111 31OCT2010 Current /*after NPNA you will not get either "NPNA" or “writoff” which violates our rule */
10 111 30NOV2010 NPNA
10 111 30JUN2011 NPNA
10 111 01JAN2012 writoff
40 555 08FEB2010 30 - 60
40 555 31MAR2010 NPNA
40 555 30APR2010 60 - 90 /*after NPNA you will not get either "NPNA" or “writoff” which violates our rule */
40 555 31MAY2010 NPNA
I have just tweaked a code developed by mkeintz for a somewhat similar situation but it doesn’t work for this situation because I cannot correctly incorporate the logic.
/*Sorting by three variables that uniquely identify a subject*/
proc sort data=this out=have;
by bank_number account_number current_date;
run;
data want;
set this;
by bank_number account_number notsorted;
retain NPNA_encountered 0;
if first.account_number then NPNA_encountered=0;
if last.account_number and NPNA_encountered=1 then output;
if arrears_bank='NPNA' then NPNA_encountered=1;
run;
I really appreciate your help to develop my code.
Thanks
Miris
Hi Mirisage,
My bad. I switched around some statements and failed to put them back to order, so I fell into this lag() caveat, here it goes:
data want;
do until (last.account_number);
set have;
by bank_number account_number current_date;
flag=ifn(Arrears_Band not in ('NPNA','writoff') and lag(Arrears_Band)='NPNA',1,flag);
if first.account_number then flag=0;
end;
do until (last.account_number);
set have;
by bank_number account_number current_date;
if flag then output;
end;
run;
Haikuo
Hi,
If you are reading your data from raw and planning to use your input code as is, it won't work. Your main problem can be resolved by 2XDOW:
data this;
input Bank_number:$ Account_number:$ Current_date :date9. Arrears_Band :&$;
format current_date date9.;
cards;
10 111 30SEP2010 NPNA
10 111 31OCT2010 Current
10 111 30NOV2010 NPNA
10 111 30JUN2011 NPNA
10 111 01JAN2012 writoff
70 222 31DEC2011 writoff
70 222 31JAN2012 NPNA
40 333 30NOV2010 Current
40 333 31DEC2010 NPNA
50 333 31JAN2011 NPNA
50 333 28FEB2011 writoff
20 333 31MAR2011 NPNA
20 333 30APR2011 NPNA
20 333 31MAY2011 NPNA
20 333 30JUN2011 NPNA
100 111 30APR2011 90 +
100 111 31MAY2011 90 +
100 111 31JUL2011 NPNA
100 111 31AUG2011 NPNA
100 111 04JAN2012 NPNA
40 555 08FEB2010 30 - 60
40 555 31MAR2010 NPNA
40 555 30APR2010 60 - 90
40 555 31MAY2010 NPNA
40 666 03JAN2012 NPNA
10 777 04FEB2010 1 - 30
10 333 03MAR2010 Current
;
proc sort data=this out=have;
by bank_number account_number current_date;
run;
/*2XDOW*/
data want;
do until (last.account_number);
set have;
by bank_number account_number current_date;
if first.account_number then flag=0;
else flag=ifn(Arrears_Band not in ('NPNA','writoff') and lag(Arrears_Band)='NPNA',1,flag);
end;
do until (last.account_number);
set have;
by bank_number account_number current_date;
if flag then output;
end;
run;
proc print;run;
Haikuo
Hi Haikuo,
Thank you very much.
This code correctly identifies one account which violates the business rule. This is the account.
Obs Bank_number Account_number Current_date Arrears_Band flag (these are variable names)
1 | 40 | 555 | 08FEB2010 | 30 - 60 | 1 |
2 | 40 | 555 | 31MAR2010 | NPNA | 1 |
3 | 40 | 555 | 30APR2010 | 60 - 90 | 1 |
4 | 40 | 555 | 31MAY2010 | NPNA | 1 |
However, there is another account which violates our business rule. Below shown is that account which is not identified by the code.
10 111 30SEP2010 NPNA
10 111 31OCT2010 Current /*after NPNA you will not get either "NPNA" or “writoff” which violates our rule */
10 111 30NOV2010 NPNA
10 111 30JUN2011 NPNA
10 111 01JAN2012 writoff
I wonder if you have sometime to have a look.
Thank you for your time and expertise.
Mirisage
Hi Mirisage,
My bad. I switched around some statements and failed to put them back to order, so I fell into this lag() caveat, here it goes:
data want;
do until (last.account_number);
set have;
by bank_number account_number current_date;
flag=ifn(Arrears_Band not in ('NPNA','writoff') and lag(Arrears_Band)='NPNA',1,flag);
if first.account_number then flag=0;
end;
do until (last.account_number);
set have;
by bank_number account_number current_date;
if flag then output;
end;
run;
Haikuo
Hi Haikuo,
Thank you very much for this code which worked very well.
I am trying to understand what this code is doing and could you please see if I have understood correctly, time permitting.
This is what “ifn” function does based on literature I have reviewed.
/*In the following code, we assign a value of 1 or 0 to flag variable*/
data test;
set test;
if sex= ‘Male’ then flag =1;
else flag=0;
run;
/*Some programmers use the following code to do the same task above*/
data test;
set test;
flag=ifn(sex=’Male’,1,0);
run;
This is your code
data want;
do until (last.account_number);
set have;
by bank_number account_number current_date;
flag=ifn(Arrears_Band not in ('NPNA','writoff') and lag(Arrears_Band)='NPNA',1,flag);
if first.account_number then flag=0;
end;
do until (last.account_number);
set have;
by bank_number account_number current_date;
if flag then output;
end;
run;
• Do until starts to read all the records pertinent to first account found in our data set called “this”.
So, these are the records.
Bank_number Account_number Current_date Arrears_Band
10 111 30SEP2010 NPNA
10 111 31OCT2010 Current
10 111 30NOV2010 NPNA
10 111 30JUN2011 NPNA
10 111 01JAN2012 writoff
when do until executes below statement, I think this is what happening.
flag=ifn(Arrears_Band not in ('NPNA','writoff') and lag(Arrears_Band)='NPNA',1,flag);
Above statement says:
If Arrears_Band not in ('NPNA','writoff') and lag(Arrears_Band)='NPNA’
then a new variable called “flag” is created and its value is returned as 1.
If this condition is not satisfied, then then the value for the variable “flag” is returned as "flag".
(I am not too sure).
Then comes to the next statement:
if first.account_number then flag=0;
Above statement returns zero for the first account_number.
Now this should be the answer after processing the account_number that do until first hits.
Bank_number | Account_number | Current_date | Arrears_Band | lag(Arrears_Band ) | Flag |
10 | 111 | 30SEP2010 | NPNA | 0 | |
10 | 111 | 31OCT2010 | Current | NPNA | 1 |
10 | 111 | 30NOV2010 | NPNA | Current | flag |
10 | 111 | 30JUN2011 | NPNA | NPNA | flag |
10 | 111 | 01JAN2012 | writoff | NPNA | flag |
Then in the next do until pass, all the records whose flag = 1 are outputted.
But then only the last 3 records of the above table should be outputted which is not our intended answer.
But your code generated the intended answer.
How come?
Would appreciate if you could shed some light.
Thank you for your time.
Mirisage
Hi Mirisage,
Your understanding of my code is largely correct. The key point of my code is A) Unconditionally use lag(). B) When first record of each group, reset 'flag' value=0, while because of A), B) needs to be put after A), otherwise B) will be overridden by A), the order is important here, this is why my first code was not really working.
Now I see what puzzles you is "flag=ifn(Arrears_Band not in ('NPNA','writoff') and lag(Arrears_Band)='NPNA',1,flag)", when condition is not satisfied, flag=flag, meaning flag stays the same, holding the whatever the value it has. So this is an One-way switch, whenever the condition is met, flag will be set to '1', and stays at '1', until the first record of next group. So we will see the following instead:
Bank_number | Account_number | Current_date | Arrears_Band | lag(Arrears_Band ) | Flag |
10 | 111 | 30SEP2010 | NPNA | 0 | |
10 | 111 | 31OCT2010 | Current | NPNA | 1 |
10 | 111 | 30NOV2010 | NPNA | Current | 1 |
10 | 111 | 30JUN2011 | NPNA | NPNA | 1 |
10 | 111 | 01JAN2012 | writoff | NPNA | 1 |
By the end of loop (for the use this do-loop, please google "DOW SAS"), the last record of this group will be left in PDV:
10 | 111 | 01JAN2012 | writoff | NPNA | 1 |
And all we need is flag=1, this flag of '1' will be carried over the second do-loop and thus output every single one of obs in that group. Again, to understand this, some knowledge on DOW will help.
HTH,
Haikuo
Hi Haikuo,
Thank you very much for this great descrition.
I really appreciate it.
Regards
Mirisage
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.
Find more tutorials on the SAS Users YouTube channel.