Hello,
I have a database of two variable: account # and previous account #, where previous account # (previous_lnacctno), refers back to a loan (lnacctno) in the same table.
I am currently trying to link every account # to the very first account, although I am sure there's a proper way to do it instead of doing iteratively as I am doing below. Ideally I would want to have as a final table every account # (lnacctno) and the first account (to be determined) and the number of accounts in between (there could be up to 7 or 8).
NOTE: I also noticed that there is a data quality problem where some loans are referencing themselves. I think a simple correction in the initial "previous_acctn" table will fix it (where lnacctno = previous_lnacctno).
proc sql;
create table previous_acctn as select
previous_lnacctno,
lnacctno
from have;
quit;
proc sql;
create table previous1 as select
t1.lnacctno, t1.previous_lnacctno, t2.previous_lnacctno as previous_lnacctno2
from previous_acctn t1
left join previous_acctn t2 on (t1.previous_lnacctno = t2.lnacctno);
quit;
proc sql;
create table previous2 as select
t1.lnacctno, t1.previous_lnacctno, t1.previous_lnacctno2, t2.previous_lnacctno as previous_lnacctno3
from previous1 t1
left join previous_acctn t2 on (t1.previous_lnacctno2 = t2.lnacctno);
quit;
proc sql;
create table previous3 as select
t1.lnacctno, t1.previous_lnacctno, t1.previous_lnacctno2, t1.previous_lnacctno3, t2.previous_lnacctno as previous_lnacctno4
from previous2 t1
left join previous_acctn t2 on (t1.previous_lnacctno3 = t2.lnacctno);
quit;
proc sql;
create table previous4 as select
t1.lnacctno, t1.previous_lnacctno, t1.previous_lnacctno2, t1.previous_lnacctno3, t1.previous_lnacctno4, t2.previous_lnacctno as previous_lnacctno5
from previous3 t1
left join previous_acctn t2 on (t1.previous_lnacctno4 = t2.lnacctno);
quit;
proc sql;
create table previous5 as select
t1.lnacctno, t1.previous_lnacctno, t1.previous_lnacctno2, t1.previous_lnacctno3, t1.previous_lnacctno4, t1.previous_lnacctno5, t2.previous_lnacctno as previous_lnacctno6
from previous4 t1
left join previous_acctn t2 on (t1.previous_lnacctno5 = t2.lnacctno);
quit;
proc sql;
create table previous6 as select
t1.lnacctno, t1.previous_lnacctno,
t1.previous_lnacctno2,
t1.previous_lnacctno3,
t1.previous_lnacctno4,
t1.previous_lnacctno5,
t1.previous_lnacctno6,
t2.previous_lnacctno as previous_lnacctno7
from previous5 t1
left join previous_acctn t2 on (t1.previous_lnacctno6 = t2.lnacctno);
quit;
proc sql;
create table previous7 as select
t1.lnacctno, t1.previous_lnacctno,
t1.previous_lnacctno2,
t1.previous_lnacctno3,
t1.previous_lnacctno4,
t1.previous_lnacctno5,
t1.previous_lnacctno6,
t1.previous_lnacctno7,
t2.previous_lnacctno as previous_lnacctno8
from previous6 t1
left join previous_acctn t2 on (t1.previous_lnacctno7 = t2.lnacctno);
quit;
This is a problem ready made for hash table use, as you will see in @PGStats's link. The code below also uses hash table:
data want (keep=lnacctno steps previous_lnacctno
rename=(previous_lnacctno=orig_lnacctno));
if _n_=1 then do;
if 0 then set have (keep=lnacctno previous_lnacctno); ** Editted line **;
declare hash h (dataset:'have (keep=lnacctno previous_lnacctno)');
h.definekey('lnacctno');
h.definedata('previous_lnacctno');
h.definedone();
end;
set have;
do steps=1 by 1 until (h.find(key:previous_lnacctno)^=0);
end;
run;
This code will work if a "parent" has more then one "child", but not if a child has more than one parent (i.e. multiple previous_lnacctno).
For each acctnum it will step through iterative lookups [h.find()] until the lookup fails (i.e. h.find() not equal 0 meaning an account has no parent). That approach could be inefficient if you have long chains, because it ignores any acctnum's for which an intermediate acctnum has already been resolved. It just steps all the way up to the first account. But for chains of 7 or 8, I don't think there would be much impact.
If you have SAS/OR then PROC BOM works or there's some tree algorithms out there that do the recursive search.
Here's an example that another user on here wrote (not my solution).
It's a macro and there's some sample data at the bottom that shows how it works. It adds a variable to identify which are grouped together and then you can find the earliest value and assign that in a second step.
https://gist.github.com/statgeek/14e3aa2a9f718f551cd98134e9ceed30#file-subgraph_macro
@camfarrell25 wrote:
Hello,
I have a database of two variable: account # and previous account #, where previous account # (previous_lnacctno), refers back to a loan (lnacctno) in the same table.
I am currently trying to link every account # to the very first account, although I am sure there's a proper way to do it instead of doing iteratively as I am doing below. Ideally I would want to have as a final table every account # (lnacctno) and the first account (to be determined) and the number of accounts in between (there could be up to 7 or 8).
proc sql; create table previous_acctn as select previous_lnacctno, lnacctno from have; quit; proc sql; create table previous1 as select t1.lnacctno, t1.previous_lnacctno, t2.previous_lnacctno as previous_lnacctno2 from previous_acctn t1 left join previous_acctn t2 on (t1.previous_lnacctno = t2.lnacctno); quit; proc sql; create table previous2 as select t1.lnacctno, t1.previous_lnacctno, t1.previous_lnacctno2, t2.previous_lnacctno as previous_lnacctno3 from previous1 t1 left join previous_acctn t2 on (t1.previous_lnacctno2 = t2.lnacctno); quit; proc sql; create table previous3 as select t1.lnacctno, t1.previous_lnacctno, t1.previous_lnacctno2, t1.previous_lnacctno3, t2.previous_lnacctno as previous_lnacctno4 from previous2 t1 left join previous_acctn t2 on (t1.previous_lnacctno3 = t2.lnacctno); quit; proc sql; create table previous4 as select t1.lnacctno, t1.previous_lnacctno, t1.previous_lnacctno2, t1.previous_lnacctno3, t1.previous_lnacctno4, t2.previous_lnacctno as previous_lnacctno5 from previous3 t1 left join previous_acctn t2 on (t1.previous_lnacctno4 = t2.lnacctno); quit; proc sql; create table previous5 as select t1.lnacctno, t1.previous_lnacctno, t1.previous_lnacctno2, t1.previous_lnacctno3, t1.previous_lnacctno4, t1.previous_lnacctno5, t2.previous_lnacctno as previous_lnacctno6 from previous4 t1 left join previous_acctn t2 on (t1.previous_lnacctno5 = t2.lnacctno); quit; proc sql; create table previous6 as select t1.lnacctno, t1.previous_lnacctno, t1.previous_lnacctno2, t1.previous_lnacctno3, t1.previous_lnacctno4, t1.previous_lnacctno5, t1.previous_lnacctno6, t2.previous_lnacctno as previous_lnacctno7 from previous5 t1 left join previous_acctn t2 on (t1.previous_lnacctno6 = t2.lnacctno); quit; proc sql; create table previous7 as select t1.lnacctno, t1.previous_lnacctno, t1.previous_lnacctno2, t1.previous_lnacctno3, t1.previous_lnacctno4, t1.previous_lnacctno5, t1.previous_lnacctno6, t1.previous_lnacctno7, t2.previous_lnacctno as previous_lnacctno8 from previous6 t1 left join previous_acctn t2 on (t1.previous_lnacctno7 = t2.lnacctno); quit;
This is a problem ready made for hash table use, as you will see in @PGStats's link. The code below also uses hash table:
data want (keep=lnacctno steps previous_lnacctno
rename=(previous_lnacctno=orig_lnacctno));
if _n_=1 then do;
if 0 then set have (keep=lnacctno previous_lnacctno); ** Editted line **;
declare hash h (dataset:'have (keep=lnacctno previous_lnacctno)');
h.definekey('lnacctno');
h.definedata('previous_lnacctno');
h.definedone();
end;
set have;
do steps=1 by 1 until (h.find(key:previous_lnacctno)^=0);
end;
run;
This code will work if a "parent" has more then one "child", but not if a child has more than one parent (i.e. multiple previous_lnacctno).
For each acctnum it will step through iterative lookups [h.find()] until the lookup fails (i.e. h.find() not equal 0 meaning an account has no parent). That approach could be inefficient if you have long chains, because it ignores any acctnum's for which an intermediate acctnum has already been resolved. It just steps all the way up to the first account. But for chains of 7 or 8, I don't think there would be much impact.
You might want to put a limit on that loop, just in case lnacctno = previous_lnacctno somewhere.
do steps=1 to 999 by 1 until (h.find(key:previous_lnacctno)^=0);
Thank you!
I tried it and it seems to work, but for some, it returns a missing value instead of a previous account number.. i'm not quite familiar with hash tables so I'm not sure where the bug is. The number of steps is greater than 1 but the previous loan account number is missing. Any ideas?
This will happen if the previous account number is missing for some account number. You could simply filter these cases out to fix the problem.
Give a look to the allChains macro here:
It is meant specifically for this kind of processing.
Post your data and output .
Create a format that gives you a previous account for a given account, and a missing value for "others".
Then you only have to loop until you arrive at missing.
As with the hash object, all account combinations have to fit into available memory.
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.