DATA Step, Macro, Functions and more

cartesian product

Valued Guide
Posts: 858

cartesian product

I have two tables with variables listed below.  In the first table origacct number is meant to be original account number and should remain the same.  acctnbr can change for several reasons but origacct should not change.  During my initial research I have found this is not true.  I am using the second table to search history, later I will be using a third but I don't currently have access. 


I'll start with what I'm thinking and hopefully someone has a better solution.  I will be doing a self join, cons_test = cons_test, joining acctnbr to origacct.  Next I will join the three variables in the second table.  I need to keep acctnbr, but join to each other variable, I will then sort by opendate to find the oldest account number associated with the current acctnbr.  Currently the database doesn't have a good way to view customer history.  Is an array the best way to accomplish this, I'm open to any suggestions.


data cons_test;
set consumer_agg(obs=10000);
keep acctnbr origacct opendate;

data fdr_start;
set fdr_agg(obs=10000);
keep cracct xref1 xref2 xrefacct opendate;

Super User
Posts: 17,819

Re: cartesian product

Create a master table of accounts to what account number they should be. This is most likely a recursive search, but it's been solved on here before several times. Ideally you don't use a Cartesian product, I think the best solutions were hash tables.

Super User
Posts: 10,500

Re: cartesian product

One might hope there are additional pieces of information to identify a person or oganization to help decide if the values really are for the same person or typos. Also I might look at the earliest dates associated with any of the information combinations as well for determining a the "true" origacct value.

Ask a Question
Discussion stats
  • 2 replies
  • 3 in conversation