I am trying to crash course my way through a hash solution with little luck. I have 5 different account numbers and I am looking to join the first(acctnbr) to any of the following 4. Then I will sort by acctnbr, opendate desc to find the oldest original account number associated with any real account. I can use name or ss# to confirm the accuracy. The first dataset below has 8 mill records, the second has 16 million, and these are representing 3 months, later I will be using 12 months. Is HASH a viable solution for this many records? If not I have a another sql join below but I'm not sure that will get the join I am looking for. If you can take a look at the attempt at the sql join I would appreciate that as well.
data consumer_agg; set consumer; keep acctnbr origacct opendate; run; data fdr_agg; set fdr; keep cracct xref1 xref2 xrefacct opendate; run; data orig_opendate; length acctnbr $20 origacct $20 opendate 8; if _n_ = 1 then do; declare hash e(dataset: 'work.fdr_agg'); e.definekey ('acctnbr'); e.definedata('origacct','opendate','cracct','xref1','xref2','xrefacct'); e.definedone(); end; set consumer_agg; drop rc; rc=e.find(); run;
/*********************/
sql solution:
data test1; set consumer_agg(obs=100); run; data test2; set fdr_agg(obs=100); run; proc sql; create table cartesian_join as select distinct a.acctnbr,a.origacct,a.opendate as cons_date,b.* from test1 a, test2 b order by opendate desc,cons_date desc; quit;
... View more