BookmarkSubscribeRSS Feed
TedCarol
Calcite | Level 5

Linking accounts through a rolling table

 

I have a table of accounts when they are lost and get replaced and it looks like this

 

new   old

123   654

789   390

944   789

801   552

002   944

 

When I have account characteristics I need to sum I need to chain together the accounts - for example the original account number is 390 and then it became 789, which became 944, which became 002.  For that account I would need to sum transactions for all 4 cards as they are essentially the same card.

 

I was able to get this to work, but for a 40,000 card population it took about 10 minutes to run, and now I'm working with over 1,000,000 accounts.  I'm wondering if a data step would be more efficient?  I did it via proc sql (after testing that nothing had been replaced more than 4 times)

 

First I created a table of account relationships using multiple fields and then used an or statement in a later table where I joined on

a.acct = b.account or a.acct = b.replace1 or a.acct = b.replace2 or a.acct = b.replace3, etc.

 

Like I mentioned it worked, but it took awhile to run and that was with a small population.  Any ideas on a more efficient way?

 

proc sql;
create table account_list as select c.account, h1.new_account as replace1,  h2.new_account as replace2
 h3.new_account as replace3,  h4.new_account as replace4

from accounts as c

left join replace_table as h1 on c.account = h1.old_acct
left join replace_table  as h2 on h1.new_account = h2.old_account 
left join replace_table  as h3 on h2.new_account = h3.old_account
left join replace_table  as h4 on h3.new_account = h4.old_account
;quit;

 

 

6 REPLIES 6
Reeza
Super User

I think you can use the macro here, you need to know the maximum number of ID's that can be chained. 

You can use it to define your grouped ID's then create a format that consolidates all accounts. 

You can then apply the format to the ID and when used in conjuction with PROC MEAN it will summarize up to the correct values.

 

https://gist.github.com/statgeek/14e3aa2a9f718f551cd98134e9ceed30

 

If you need further help please post detailed data including what you need as output.

TedCarol
Calcite | Level 5

Thank you I will take a look.  

 

I have a final transaction table like this

 

123  5

390  2

654  1

789  12

944  7

002  4

801  2

552  1

 

 

So for that example of the account replaced 3 times -- where 390 became 789, which became 944, which became 002 -- I have my final table showing the original account (390) with 25 transactions - the sum of 390 and the 3 replacement accounts.

 

KachiM
Rhodochrosite | Level 12

TedCarol;

There are missing specifications in yout post. Do you use two data sets as given below? Ksharp has given a hash program that gives the second data set(transaction).
I assume you are using two data sets. There are two methods(Array and Hash). Array approach might scale up to several millions. Hash might eat more memory - I am
leaving to your experimentation.

Another issue is the chaining. There is one full chain and many partial chains. For instance
390=>789=>944=>002 is full chain for acct=390.
and
789=>944=>002
944=>002
are partial chains of 390.

Do you want both the chains?

Another issue is whether your new, acct, and act (as in my example data sets) are numers or digital strings?

My solutions will still work if you use INPUT() function to convert digital strings to numbers.

data accts;
input new acct;
datalines;
123   654
789   390
944   789
801   552
002   944
;
run;

data transaction;
input act tran_val;
datalines;
123  5
390  2
654  1
789  12
944  7
002  4
801  2
552  1
;
run;

The output based on full and partial chains is:

Obs	acct	Sum
1	654	6
2	390	25
3	789	23
4	552	3
5	944	11


The output based on full chain only is:

Obs	acct	sum
1	654	6
2	390	25
3	552	3


Program for FULL and PARTIAL SUMS Based on ARRAYS:

Arrays require predetermined range of acct(low & high). You need to preprocess the data to get low & high
to run the Array Program

%let low = 2;
%let high = 944;
         
data want1(rename=(acct1 = acct));
   if _n_ = 1 then do;
      array k[&low:&high] _temporary_;
      array m[&low:&high] _temporary_;
      do until(eof1);
         set accts end = eof1;
         k[acct] = new;
      end;
      do until(eof2);
         set transaction end = eof2;
         m[act] = tran_val;
      end;
   end;
   set accts;
      acct1 = acct;
      Sum = m[acct];
      acct = k[acct];
      do while(acct > 0);
         Sum + m[acct];
         acct = k[acct];
      end;
keep acct1 sum;
run;

proc print data = want1;
run;


Program for FULL and PARTIAL SUMS Based on HASH:


data want2(rename=(acct1 = acct));
   if _n_ = 1 then do;
      if 0 then do;
         set accts;
         set transaction;
      end;
      declare hash ha(dataset:'accts');
      ha.definekey('acct');
      ha.definedata('new');
      ha.definedone();
         
      declare hash ht(dataset:'transaction');
      ht.definekey('act');
      ht.definedata('tran_val');
      ht.definedone();
   end;

   do until(eof);
      set accts end = eof;
      acct1 = acct;
      rc = ht.find(key:acct);
      sum = tran_val;
      do while(ha.find(key:acct) = 0);
         acct = new;
         rc = ht.find(key:acct);
         sum + tran_val;
      end;
      output;
   end;
keep acct1 sum;
run;

proc print data = want2;
run;

Program for FULL SUMS Based on ARRAYS:
         
%let low = 2;
%let high = 944;
         
data want3(rename=(acct1 = acct));
   if _n_ = 1 then do;
      array k[&low:&high] _temporary_;
      array m[&low:&high] _temporary_;
      do until(eof1);
         set accts end = eof1;
         k[acct] = new;
      end;
      do until(eof2);
         set transaction end = eof2;
         m[act] = tran_val;
      end;
   end;
   set accts;
      acct1 = acct;
      Sum = m[acct];
      acct = k[acct];
      do while(acct > 0);
         Sum + m[acct];
         m[acct] = .;
         acct = k[acct];
      end;
      if sum > 0;
keep acct1 sum;
run;

proc print data = want3;
run;


Program for FULL SUMS Based on HASH:

data want4(rename=(acct1 = acct));
   if _n_ = 1 then do;
      if 0 then do;
         set accts;
         set transaction;
      end;
      declare hash ha(dataset:'accts');
      ha.definekey('acct');
      ha.definedata('new');
      ha.definedone();
         
      declare hash ht(dataset:'transaction');
      ht.definekey('act');
      ht.definedata('tran_val');
      ht.definedone();
   end;

   do until(eof);
      set accts end = eof;
      acct1 = acct;
      rc = ht.find(key:acct);
      sum = tran_val;
      do while(ha.find(key:acct) = 0);
         acct = new;
         rc = ht.find(key:acct);
         sum + tran_val;
         tran_val = .;
         rc = ht.replace(key:acct, data:tran_val);
      end;
      if sum > 0 then output;
   end;
keep acct1 sum;
run;

proc print data = want4;
run;

I appeciate if you can share you experimations.
Ksharp
Super User

Here is .

 




data have;
infile cards ;
input  to $ from $  ;
cards;
123   654
789   390
944   789
801   552
002   944
;
run;
data full;
  set have end=last;
  if _n_ eq 1 then do;
   declare hash h();
    h.definekey('node');
     h.definedata('node');
     h.definedone();
  end;
  output;
  node=from; h.replace();
  from=to; to=node;
  output;
  node=from; h.replace();
  if last then h.output(dataset:'node');
  drop node;
run;


data want(keep=node household);
declare hash ha(ordered:'a');
declare hiter hi('ha');
ha.definekey('count');
ha.definedata('last');
ha.definedone();
declare hash _ha(hashexp: 20);
_ha.definekey('key');
_ha.definedone();

if 0 then set full;
declare hash from_to(dataset:'full(where=(from is not missing and to is not missing))',hashexp:20,multidata:'y');
 from_to.definekey('from');
 from_to.definedata('to');
 from_to.definedone();

if 0 then set node;
declare hash no(dataset:'node');
declare hiter hi_no('no');
 no.definekey('node');
 no.definedata('node');
 no.definedone();
 

do while(hi_no.next()=0);
 household+1; output;
 count=1;
 key=node;_ha.add();
 last=node;ha.add();
 rc=hi.first();
 do while(rc=0);
   from=last;rx=from_to.find();
   do while(rx=0);
     key=to;ry=_ha.check();
      if ry ne 0 then do;
       node=to;output;rr=no.remove(key:node);
       key=to;_ha.add();
       count+1;
       last=to;ha.add();
      end;
      rx=from_to.find_next();
   end;
   rc=hi.next();
end;
ha.clear();_ha.clear();
end;
stop;
run;
TedCarol
Calcite | Level 5

Thank you both!!  I will test these today.  

KachiM
Rhodochrosite | Level 12

TedCarol:

 

Have you tested the programs?

 

Which one of the two outputs is your choice?

 

Does it scale up to your data set?

 

You are asking for help. When help comes and if it is right, share your joy with SAS Community. If the help is not right for you, come to SAS Cmmunity and give your expected output data set and ask for a suitable and acceptable answer. 

 

 

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 6 replies
  • 561 views
  • 0 likes
  • 4 in conversation