07-11-2013 11:19 AM
So I have table A with 150 records that I want info for from tables B-O.
I tried doing a normal inner join ( i want the same variable from each of the tables B-O) but it is taking forever.
right now I have:
create table joined as select a.*, b.variable as var1, c.variable as vari2.......
inner join b
inner join c
etc. Any faster way?
07-11-2013 11:27 AM
150 records shouldn't be long. How big are tables B-O?
If you're only taking a single variable from each, look at using a format or hash approach instead.
07-12-2013 10:56 AM
How large (number of variables and approx total length of vars) and tall (num of records) are tables B-O? If they are somewhat small, this could be done easily and very fast with hash tables. If they are large, running 14 large hash tables will probably end in an insufficient memory error.
As a general statement though, I would think the reason why it is so slow is the lack of nesting in your query. At least I would suspect that chains of joins are processed the same way, regardless that they are inner or outer and thus there is probably an attempt to inner join the B-O tables with each other prior to eventually doing it with the very small 150 records initial set. I'm not a SQL expert and thus I don't know if there is a maximum nesting level necessary but here's how I would do it keeping your current idea:
create table joined as
select o.variable as var14, temp.*
from o inner join (select n.variable as var13, temp.*
from n inner join (select m.variable as var12, temp*
from b inner join a
) as temp
on temp.match=c.match) as temp
on temp.match=d.match) as temp
Hash tables would definitely be great for your purpose though
length match $8.;
length variable1 $20 /*...*/ variable14 $20;
if _N_=1 then do;
/* Declare B-O hashes */
declare hash hash_o(dataset:"o(rename=(variable=variable14))");
hash_o.DefineData("variable14"); /* or its renamed name */
/* ... more hash declaration ...*/
declare hash hash_b(dataset:"b(rename=(variable=variable1))");
hash_b.DefineData("variable1"); /* or its renamed name */
end; /* B-O hash tables are declared */
rc=0; /* initialize error code */
if rc=0 then output; /*this is to handle the inner join; if any of the key was not found in one of the tables, one of the RCs will have been > 0 and we wouldn't want to output that record with some missing values */
I can take roughly 1M records with give or take 100-200 total length of variables on my machine at work so if the total of tables B through O is below that, odds are you will have no problem whatsoever with the hash approach and will feel like it is absurd fast AND easier to program than nesting joins;
Hope this helps!
07-12-2013 12:41 PM
I do this kind of join all the time with SAS and it is very fast. You are joining 15 tables, the max is 32 so no problem there. Unless you really want to eliminate all records from A that do not have a match in every other table, it would be better to replace the inner joins with left joins.