BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
camfarrell25
Quartz | Level 8

Hello,

 

I have a database of two variable: account # and previous account #, where previous account # (previous_lnacctno), refers back to a loan (lnacctno) in the same table.

 

I am currently trying to link every account # to the very first account, although I am sure there's a proper way to do it instead of doing iteratively as I am doing below. Ideally I would want to have as a final table every account # (lnacctno) and the first account (to be determined) and the number of accounts in between (there could be up to 7 or 8).

 

NOTE: I also noticed that there is a data quality problem where some loans are referencing themselves. I think a simple correction in the initial "previous_acctn" table will fix it (where lnacctno = previous_lnacctno).

proc sql; 
create table previous_acctn as select 
previous_lnacctno, 
lnacctno 
from have; 
quit; 

proc sql; 
create table previous1 as select 
t1.lnacctno, t1.previous_lnacctno, t2.previous_lnacctno as previous_lnacctno2 
from previous_acctn t1 
left join previous_acctn t2 on (t1.previous_lnacctno = t2.lnacctno);
quit; 

proc sql; 
create table previous2 as select 
t1.lnacctno, t1.previous_lnacctno, t1.previous_lnacctno2, t2.previous_lnacctno as previous_lnacctno3
from previous1 t1 
left join previous_acctn t2 on (t1.previous_lnacctno2 = t2.lnacctno);
quit; 

proc sql; 
create table previous3 as select 
t1.lnacctno, t1.previous_lnacctno, t1.previous_lnacctno2, t1.previous_lnacctno3, t2.previous_lnacctno as previous_lnacctno4
from previous2 t1 
left join previous_acctn t2 on (t1.previous_lnacctno3 = t2.lnacctno);
quit; 

proc sql; 
create table previous4 as select 
t1.lnacctno, t1.previous_lnacctno, t1.previous_lnacctno2, t1.previous_lnacctno3, t1.previous_lnacctno4, t2.previous_lnacctno as previous_lnacctno5
from previous3 t1 
left join previous_acctn t2 on (t1.previous_lnacctno4 = t2.lnacctno);
quit; 


proc sql; 
create table previous5 as select 
t1.lnacctno, t1.previous_lnacctno, t1.previous_lnacctno2, t1.previous_lnacctno3, t1.previous_lnacctno4, t1.previous_lnacctno5, t2.previous_lnacctno as previous_lnacctno6
from previous4 t1 
left join previous_acctn t2 on (t1.previous_lnacctno5 = t2.lnacctno);
quit; 

proc sql; 
create table previous6 as select 
t1.lnacctno, t1.previous_lnacctno, 
t1.previous_lnacctno2, 
t1.previous_lnacctno3, 
t1.previous_lnacctno4, 
t1.previous_lnacctno5, 
t1.previous_lnacctno6,
t2.previous_lnacctno as previous_lnacctno7
from previous5 t1 
left join previous_acctn t2 on (t1.previous_lnacctno6 = t2.lnacctno);
quit; 

proc sql; 
create table previous7 as select 
t1.lnacctno, t1.previous_lnacctno, 
t1.previous_lnacctno2, 
t1.previous_lnacctno3, 
t1.previous_lnacctno4, 
t1.previous_lnacctno5, 
t1.previous_lnacctno6,
t1.previous_lnacctno7,
t2.previous_lnacctno as previous_lnacctno8
from previous6 t1 
left join previous_acctn t2 on (t1.previous_lnacctno7 = t2.lnacctno);
quit; 

 

 

1 ACCEPTED SOLUTION

Accepted Solutions
mkeintz
PROC Star

This is a problem ready made for hash table use, as you will see in @PGStats's link.   The code below also uses hash table:

 

data want (keep=lnacctno steps previous_lnacctno
          rename=(previous_lnacctno=orig_lnacctno));

  if _n_=1 then do;
    if 0 then set have (keep=lnacctno previous_lnacctno);  ** Editted line **;
 
    declare hash h (dataset:'have (keep=lnacctno previous_lnacctno)');
      h.definekey('lnacctno');
      h.definedata('previous_lnacctno');
      h.definedone();
  end;

  set have;

  do steps=1 by 1 until (h.find(key:previous_lnacctno)^=0);
  end;
run;

 

This code will work if a "parent" has more then one "child", but not if a child has more than one parent (i.e. multiple previous_lnacctno).

 

For each acctnum it will step through iterative lookups [h.find()] until the lookup fails (i.e. h.find() not equal 0 meaning an account has no parent).  That approach could be inefficient if you have long chains, because it ignores any acctnum's for which an intermediate acctnum has already been resolved.  It just steps all the way up to the first account.  But for chains of 7 or 8, I don't think there would be much impact.

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------

View solution in original post

8 REPLIES 8
Reeza
Super User

If you have SAS/OR then PROC BOM works or there's some tree algorithms out there that do the recursive search. 

 

Here's an example that another user on here wrote (not my solution). 

It's a macro and there's some sample data at the bottom that shows how it works. It adds a variable to identify which are grouped together and then you can find the earliest value and assign that in a second step. 

https://gist.github.com/statgeek/14e3aa2a9f718f551cd98134e9ceed30#file-subgraph_macro

 


@camfarrell25 wrote:

Hello,

 

I have a database of two variable: account # and previous account #, where previous account # (previous_lnacctno), refers back to a loan (lnacctno) in the same table.

 

I am currently trying to link every account # to the very first account, although I am sure there's a proper way to do it instead of doing iteratively as I am doing below. Ideally I would want to have as a final table every account # (lnacctno) and the first account (to be determined) and the number of accounts in between (there could be up to 7 or 8).

proc sql; 
create table previous_acctn as select 
previous_lnacctno, 
lnacctno 
from have; 
quit; 

proc sql; 
create table previous1 as select 
t1.lnacctno, t1.previous_lnacctno, t2.previous_lnacctno as previous_lnacctno2 
from previous_acctn t1 
left join previous_acctn t2 on (t1.previous_lnacctno = t2.lnacctno);
quit; 

proc sql; 
create table previous2 as select 
t1.lnacctno, t1.previous_lnacctno, t1.previous_lnacctno2, t2.previous_lnacctno as previous_lnacctno3
from previous1 t1 
left join previous_acctn t2 on (t1.previous_lnacctno2 = t2.lnacctno);
quit; 

proc sql; 
create table previous3 as select 
t1.lnacctno, t1.previous_lnacctno, t1.previous_lnacctno2, t1.previous_lnacctno3, t2.previous_lnacctno as previous_lnacctno4
from previous2 t1 
left join previous_acctn t2 on (t1.previous_lnacctno3 = t2.lnacctno);
quit; 

proc sql; 
create table previous4 as select 
t1.lnacctno, t1.previous_lnacctno, t1.previous_lnacctno2, t1.previous_lnacctno3, t1.previous_lnacctno4, t2.previous_lnacctno as previous_lnacctno5
from previous3 t1 
left join previous_acctn t2 on (t1.previous_lnacctno4 = t2.lnacctno);
quit; 


proc sql; 
create table previous5 as select 
t1.lnacctno, t1.previous_lnacctno, t1.previous_lnacctno2, t1.previous_lnacctno3, t1.previous_lnacctno4, t1.previous_lnacctno5, t2.previous_lnacctno as previous_lnacctno6
from previous4 t1 
left join previous_acctn t2 on (t1.previous_lnacctno5 = t2.lnacctno);
quit; 

proc sql; 
create table previous6 as select 
t1.lnacctno, t1.previous_lnacctno, 
t1.previous_lnacctno2, 
t1.previous_lnacctno3, 
t1.previous_lnacctno4, 
t1.previous_lnacctno5, 
t1.previous_lnacctno6,
t2.previous_lnacctno as previous_lnacctno7
from previous5 t1 
left join previous_acctn t2 on (t1.previous_lnacctno6 = t2.lnacctno);
quit; 

proc sql; 
create table previous7 as select 
t1.lnacctno, t1.previous_lnacctno, 
t1.previous_lnacctno2, 
t1.previous_lnacctno3, 
t1.previous_lnacctno4, 
t1.previous_lnacctno5, 
t1.previous_lnacctno6,
t1.previous_lnacctno7,
t2.previous_lnacctno as previous_lnacctno8
from previous6 t1 
left join previous_acctn t2 on (t1.previous_lnacctno7 = t2.lnacctno);
quit; 

 

 

 

 

mkeintz
PROC Star

This is a problem ready made for hash table use, as you will see in @PGStats's link.   The code below also uses hash table:

 

data want (keep=lnacctno steps previous_lnacctno
          rename=(previous_lnacctno=orig_lnacctno));

  if _n_=1 then do;
    if 0 then set have (keep=lnacctno previous_lnacctno);  ** Editted line **;
 
    declare hash h (dataset:'have (keep=lnacctno previous_lnacctno)');
      h.definekey('lnacctno');
      h.definedata('previous_lnacctno');
      h.definedone();
  end;

  set have;

  do steps=1 by 1 until (h.find(key:previous_lnacctno)^=0);
  end;
run;

 

This code will work if a "parent" has more then one "child", but not if a child has more than one parent (i.e. multiple previous_lnacctno).

 

For each acctnum it will step through iterative lookups [h.find()] until the lookup fails (i.e. h.find() not equal 0 meaning an account has no parent).  That approach could be inefficient if you have long chains, because it ignores any acctnum's for which an intermediate acctnum has already been resolved.  It just steps all the way up to the first account.  But for chains of 7 or 8, I don't think there would be much impact.

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------
PGStats
Opal | Level 21

You might want to put a limit on that loop, just in case lnacctno = previous_lnacctno somewhere.

 

do steps=1 to 999 by 1 until (h.find(key:previous_lnacctno)^=0);
PG
camfarrell25
Quartz | Level 8

Thank you! 

I tried it and it seems to work, but for some, it returns a missing value instead of a previous account number.. i'm not quite familiar with hash tables so I'm not sure where the bug is. The number of steps is greater than 1 but the previous loan account number is missing. Any ideas? 

PGStats
Opal | Level 21

This will happen if the previous account number is missing for some account number. You could simply filter these cases out to fix the problem.

PG
PGStats
Opal | Level 21

Give a look to the allChains macro here:

 

https://communities.sas.com/t5/SAS-Communities-Library/Finding-all-single-linked-chains-the-allChain...

 

It is meant specifically for this kind of processing.

PG
Ksharp
Super User

Post your data and output .

Kurt_Bremser
Super User

Create a format that gives you a previous account for a given account, and a missing value for "others".

Then you only have to loop until you arrive at missing.

As with the hash object, all account combinations have to fit into available memory.

Ready to join fellow brilliant minds for the SAS Hackathon?

Build your skills. Make connections. Enjoy creative freedom. Maybe change the world. Registration is now open through August 30th. Visit the SAS Hackathon homepage.

Register today!
How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 8 replies
  • 1260 views
  • 6 likes
  • 6 in conversation