Hi Collegues;
I have this data set - note it has 4 accounts.
data a;
informat Current_date date9.;
Input Current_date Account_number $11-13 Arrears_Band $ 15-21;
Format Current_date date9.;
cards;
28FEB2010 111 NPNA
31MAR2010 111 Current
30APR2010 111 30 - 60
28FEB2010 444 Current
31MAR2010 444 30 - 60
30APR2010 444 30 - 60
31MAY2010 444 Current
31OCT2010 444 Current
31DEC2010 444 Current
31JAN2011 444 1 - 30
28FEB2011 444 30 - 60
31MAR2011 444 60 - 90
30JUN2011 444 90 +
31JUL2011 444 NPNA
31AUG2011 444 NPNA
28FEB2010 555 30 - 60
31MAR2010 555 30 - 60
30APR2010 555 60 - 90
31MAY2010 555 NPNA
31AUG2011 555 90 +
30SEP2011 555 NPNA
31MAY2010 666 Current
31AUG2011 666 90 +
30SEP2011 666 Current
;
run;
Question:
I want to remove any Account_number that has fallen at least once from a "non 90 +” Arrears_Band to NPNA
Answer
Based on this condition, Account_number 555 has to be removed.
My appraoch
1. I have used the below code helped by PG Stat to track
how each account has successively fallen across different Arrears_Bands.
2. Then visually spotted the account_numbers that satisfy the
above condition.
3. Finally, I did a "sub setting if" step, using those spotted account_numbers.
But I have a huge data set which takes time to do this.
Could any one suggest an efficient alternative for the last step?
I used this approach.
proc sort data=a;
by account_number Current_date;
run;
data cat(keep=account_number category);
length Category $200;
do until(last.account_number);
do until (last.Arrears_Band);
set a;
by account_number Arrears_Band notsorted;
end;
category = catx(" to ", category, Arrears_Band);
end;
run;
data b;
set a;
if account_number not in ('555');
run;
/*Thanks
Mirisage
Hi Haikuo,
Your revised code is great!
I applied it into my original big data set and it is extremely efficicent.
In fact, I applied your code and PGStat's two codes into my big data set and then compared the final 3 results.
They are identical.
Thank you very much for this support.
Best regards
Mirisage
Hi, To match your output, one condition needs to be modified: instead of just "non 90 +”, it should be 'non 90 +' and 'non NPNA'.
2X DOWs seems to do the job:
data a;
informat Current_date date9.;
Input Current_date Account_number :$4. Arrears_Band $&10.;
Format Current_date date9.;
cards;
28FEB2010 111 NPNA
31MAR2010 111 Current
30APR2010 111 30 - 60
28FEB2010 444 Current
31MAR2010 444 30 - 60
30APR2010 444 30 - 60
31MAY2010 444 Current
31OCT2010 444 Current
31DEC2010 444 Current
31JAN2011 444 1 - 30
28FEB2011 444 30 - 60
31MAR2011 444 60 - 90
30JUN2011 444 90 +
31JUL2011 444 NPNA
31AUG2011 444 NPNA
28FEB2010 555 30 - 60
31MAR2010 555 30 - 60
30APR2010 555 60 - 90
31MAY2010 555 NPNA
31AUG2011 555 90 +
30SEP2011 555 NPNA
31MAY2010 666 Current
31AUG2011 666 90 +
30SEP2011 666 Current
;
data want;
do until (last.account_number);
set a ; by account_number notsorted;
length _ab $10.;
if not missing(_ab) and _ab not in ('90 +','NPNA') and Arrears_Band='NPNA' then do;
_flag=1;
leave;
end;
_ab=Arrears_Band;
end;
do until (last.account_number);
set a ; by account_number notsorted;
if _flag ne 1 then output;
end;
drop _:;
run;
proc print;run;
Haikuo
Hi Hai,
Many Thanks.
Your code works well for the simplyfiled data set I have provided in the querry.
However, I have come across several other accounts in my original data set which is represented by account_number 777 in the below data set. It is not captured by the code. This account also has fallen directly from an arreres_band other than 90+.
So, account_number 777 also has to be removed because
I want to remove any Account_number that has fallen at least once to NPNA from an Arrears_Band other than 90+.
Thank you very much. Mirisage
data a;
informat Current_date date9.;
Input Current_date Account_number :$4. Arrears_Band $&10.;
Format Current_date date9.;
cards;
28FEB2010 111 NPNA
31MAR2010 111 Current
30APR2010 111 30 - 60
28FEB2010 444 Current
31MAR2010 444 30 - 60
30APR2010 444 30 - 60
31MAY2010 444 Current
31OCT2010 444 Current
31DEC2010 444 Current
31JAN2011 444 1 - 30
28FEB2011 444 30 - 60
31MAR2011 444 60 - 90
30JUN2011 444 90 +
31JUL2011 444 NPNA
31AUG2011 444 NPNA
28FEB2010 555 30 - 60
31MAR2010 555 30 - 60
30APR2010 555 60 - 90
31MAY2010 555 NPNA
31AUG2011 555 90 +
30SEP2011 555 NPNA
31MAY2010 666 Current
31AUG2011 666 90 +
30SEP2011 666 Current
28FEB2010 777 Current
31MAR2010 777 Current
30APR2010 777 Current
31DEC2010 777 Current
31JAN2011 777 NPNA
28FEB2011 777 NPNA
31MAR2011 777 NPNA
;
run;
Try this :
data a;
informat Current_date date9.;
Input Current_date Account_number :$4. Arrears_Band $&10.;
Format Current_date date9.;
cards;
28FEB2010 111 NPNA
31MAR2010 111 Current
30APR2010 111 30 - 60
28FEB2010 444 Current
31MAR2010 444 30 - 60
30APR2010 444 30 - 60
31MAY2010 444 Current
31OCT2010 444 Current
31DEC2010 444 Current
31JAN2011 444 1 - 30
28FEB2011 444 30 - 60
31MAR2011 444 60 - 90
30JUN2011 444 90 +
31JUL2011 444 NPNA
31AUG2011 444 NPNA
28FEB2010 555 30 - 60
31MAR2010 555 30 - 60
30APR2010 555 60 - 90
31MAY2010 555 NPNA
31AUG2011 555 90 +
30SEP2011 555 NPNA
31MAY2010 666 Current
31AUG2011 666 90 +
30SEP2011 666 Current
28FEB2010 777 Current
31MAR2010 777 30 - 60
30APR2010 777 30 - 60
31MAY2010 777 Current
31OCT2010 777 Current
;
proc sort data=a;
by account_number Current_date;
run;
data nonJumpers(keep=account_number);
length lastBand $10;
do until(last.account_number);
do until (last.Arrears_Band);
set a;
by account_number Arrears_Band notsorted;
end;
if Arrears_Band = "NPNA" and
lastBand ne "90 +" and
not missing(lastBand) then drop = 1;
lastBand = Arrears_Band;
end;
if drop = 1 then delete;
run;
proc sql;
create table want as
select * from a
where account_number in (select * from nonJumpers);
drop table nonJumpers;
select * from want;
quit;
PG
Hi PGStats,
Thank you very much.
Your code work exactly.
I have added another possible account succession like below into the previous data set and applied this code. The code removes this account too which is correct because I want to remove any Account_number that has fallen at least once to NPNA from an Arrears_Band other than 90+.
data a;
informat Current_date date9.;
Input Current_date Account_number :$4. Arrears_Band $&10.;
Format Current_date date9.;
cards;
28FEB2010 777 Current
31MAR2010 777 Current
30APR2010 777 Current
31DEC2010 777 Current
31JAN2011 777 NPNA
28FEB2011 777 NPNA
31MAR2011 777 NPNA
;
run;
When I applied the second part of your code to my original huge data set, it gives the message “will take a large amount of system resources” and got stuck.
So, I used the following approach which worked but never know my approach is efficient.
proc sort data=a;
by account_number;
run;
proc sort data=nonJumpers;
by account_number;
run;
data want;
merge a (in=a)
nonJumpers (in=b)
;
by account_number;
if a and b;
run;
Thank you again for your support.
Mirisage
Hi,
The code being provided did have a glitch, now try this one:
data a;
informat Current_date date9.;
Input Current_date Account_number :$4. Arrears_Band $&10.;
Format Current_date date9.;
cards;
28FEB2010 111 NPNA
31MAR2010 111 Current
30APR2010 111 30 - 60
28FEB2010 444 Current
31MAR2010 444 30 - 60
30APR2010 444 30 - 60
31MAY2010 444 Current
31OCT2010 444 Current
31DEC2010 444 Current
31JAN2011 444 1 - 30
28FEB2011 444 30 - 60
31MAR2011 444 60 - 90
30JUN2011 444 90 +
31JUL2011 444 NPNA
31AUG2011 444 NPNA
28FEB2010 555 30 - 60
31MAR2010 555 30 - 60
30APR2010 555 60 - 90
31MAY2010 555 NPNA
31AUG2011 555 90 +
30SEP2011 555 NPNA
31MAY2010 666 Current
31AUG2011 666 90 +
30SEP2011 666 Current
28FEB2010 777 Current
31MAR2010 777 Current
30APR2010 777 Current
31DEC2010 777 Current
31JAN2011 777 NPNA
28FEB2011 777 NPNA
31MAR2011 777 NPNA
;
run;
data want;
do until (last.account_number);
set a ; by account_number notsorted;
length _ab $10.;
if first.account_number then call missing(_ab);
if not missing(_ab) and _ab not in ('90 +','NPNA') and Arrears_Band='NPNA' then do;
_flag=1;
end;
_ab=Arrears_Band;
if _flage=1 then leave;
end;
do until (last.account_number);
set a ; by account_number notsorted;
if _flag ne 1 then output;
end;
drop _:;
run;
proc print;run;
Please report back about the efficiency if you decide to use it on your real data.
Regards,
Haikuo
Hi Haikuo,
Your revised code is great!
I applied it into my original big data set and it is extremely efficicent.
In fact, I applied your code and PGStat's two codes into my big data set and then compared the final 3 results.
They are identical.
Thank you very much for this support.
Best regards
Mirisage
Good idea. Note that dataset a is already sorted by account_number, and since dataset nonJumbers is created by scanning dataset a, it is also sorted.
But you might be able to get rid of the sorts altogether, as long as the dates are sorted in increasing order within each account_number, you could do:
data a;
informat Current_date date9.;
Input Current_date Account_number :$4. Arrears_Band $&10.;
Format Current_date date9.;
cards;
28FEB2010 111 NPNA
31MAR2010 111 Current
30APR2010 111 30 - 60
28FEB2010 555 30 - 60
31MAR2010 555 30 - 60
30APR2010 555 60 - 90
31MAY2010 555 NPNA
31AUG2011 555 90 +
30SEP2011 555 NPNA
28FEB2010 444 Current
31MAR2010 444 30 - 60
30APR2010 444 30 - 60
31MAY2010 444 Current
31OCT2010 444 Current
31DEC2010 444 Current
31JAN2011 444 1 - 30
28FEB2011 444 30 - 60
31MAR2011 444 60 - 90
30JUN2011 444 90 +
31JUL2011 444 NPNA
31AUG2011 444 NPNA
31MAY2010 666 Current
31AUG2011 666 90 +
30SEP2011 666 Current
28FEB2010 777 Current
31MAR2010 777 Current
30APR2010 777 Current
31DEC2010 777 Current
31JAN2011 777 NPNA
28FEB2011 777 NPNA
31MAR2011 777 NPNA
;
data nonJumpers(keep=account_number);
length lastBand $10;
do until(last.account_number);
do until (last.Arrears_Band);
set a;
by account_number Arrears_Band notsorted;
end;
if Arrears_Band = "NPNA" and
lastBand ne "90 +" and
not missing(lastBand) then drop = 1;
lastBand = Arrears_Band;
end;
if drop = 1 then delete;
run;
/* You can't merge by notsorted but you can do this */
data want;
set nonJumpers(rename=(account_number=act));
do until(account_number=act and last.account_number);
set a;
by account_number notsorted;
if account_number=act then output;
end;
run;
PG
Hi PGStats,
Your alternative code which avoids sorting step is a magic!
I applied it into my original big data set and it runs pretty fast.
In fact, I applied your two codes and Haikuo's revised code into my big data set and then compared the final 3 results.
They are identical.
Big thank for your help.
Best regards
Mirisage
Don't miss out on SAS Innovate - Register now for the FREE Livestream!
Can't make it to Vegas? No problem! Watch our general sessions LIVE or on-demand starting April 17th. Hear from SAS execs, best-selling author Adam Grant, Hot Ones host Sean Evans, top tech journalist Kara Swisher, AI expert Cassie Kozyrkov, and the mind-blowing dance crew iLuminate! Plus, get access to over 20 breakout sessions.
Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.
Find more tutorials on the SAS Users YouTube channel.