DATA Step, Macro, Functions and more

Multiple inner joins taking too long, easier way?

Reply
Frequent Contributor
Posts: 82

Multiple inner joins taking too long, easier way?

   So I have table A with 150 records that I want info for from tables B-O.

I tried doing a normal inner join ( i want the same variable from each of the tables B-O) but it is taking forever.

right now I have:

proc sql;

create table joined as select a.*, b.variable as var1, c.variable as vari2.......

inner join b

on a.match=b.match

inner join c

on a.match=c.match

etc. Any faster way?

Super User
Posts: 19,815

Re: Multiple inner joins taking too long, easier way?

150 records shouldn't be long. How big are tables B-O?

If you're only taking a single variable from each, look at using a format or hash approach instead.

Super User
Posts: 5,429

Re: Multiple inner joins taking too long, easier way?

Indexing on the match columns might help.

Data never sleeps
Super Contributor
Posts: 339

Re: Multiple inner joins taking too long, easier way?

How large (number of variables and approx total length of vars) and tall (num of records) are tables B-O? If they are somewhat small, this could be done easily and very fast with hash tables. If they are large, running 14 large hash tables will probably end in an insufficient memory error.

As a general statement though, I would think the reason why it is so slow is the lack of nesting in your query. At least I would suspect that chains of joins are processed the same way, regardless that they are inner or outer and thus there is probably an attempt to inner join the B-O tables with each other prior to eventually doing it with the very small 150 records initial set. I'm not a SQL expert and thus I don't know if there is a maximum nesting level necessary but here's how I would do it keeping your current idea:

proc sql;

     create table joined as

     select o.variable as var14, temp.*

     from o inner join (select n.variable as var13, temp.*

                              from n inner join (select m.variable as var12, temp*

                                       ...more nesting...

                           

                                                

                                                  from b inner join a

                                                  on a.match=b.match

                                                  ) as temp

                              on temp.match=c.match) as temp

                        on temp.match=d.match) as temp

etc.

quit;

Hash tables would definitely be great for your purpose though

data joined;

     length match $8.;

     length variable1 $20 /*...*/ variable14 $20;

     if _N_=1 then do;

          /* Declare B-O hashes */

          declare hash hash_o(dataset:"o(rename=(variable=variable14))");

          hash_o.DefineKey("match");

          hash_o.DefineData("variable14"); /* or its renamed name */

          hash_o.DefineDone;

          /* ... more hash declaration ...*/

          declare hash hash_b(dataset:"b(rename=(variable=variable1))");

          hash_b.DefineKey("match");

          hash_b.DefineData("variable1"); /* or its renamed name */

          hash_b.DefineDone;

     end; /* B-O hash tables are declared */

     set a;

     rc=0; /* initialize error code */

     rc=max(hash_b.find(), rc);

     rc=max(hash_c.find(), rc);

     /*...*/

     rc=max(hash_o.find(), rc);

     if rc=0 then output; /*this is to handle the inner join; if any of the key was not found in one of the tables, one of the RCs will have been > 0 and we wouldn't want to output that record with some missing values */

run;

I can take roughly 1M records with give or take 100-200 total length of variables on my machine at work so if the total of tables B through O is below that, odds are you will have no problem whatsoever with the hash approach and will feel like it is absurd fast AND easier to program than nesting joins;

Hope this helps!

Vincent

Respected Advisor
Posts: 4,925

Re: Multiple inner joins taking too long, easier way?

I do this kind of join all the time with SAS and it is very fast. You are joining 15 tables, the max is 32 so no problem there. Unless you really want to eliminate all records from A that do not have a match in every other table, it would be better to replace the inner joins with left joins.

PG

PG
Valued Guide
Posts: 2,177

Re: Multiple inner joins taking too long, easier way?

If 15 joi s are a problem for one query, break the task  into (upto) 15 separate queries

Ask a Question
Discussion stats
  • 5 replies
  • 1488 views
  • 1 like
  • 6 in conversation