BookmarkSubscribeRSS Feed
Xamius32
Calcite | Level 5

   So I have table A with 150 records that I want info for from tables B-O.

I tried doing a normal inner join ( i want the same variable from each of the tables B-O) but it is taking forever.

right now I have:

proc sql;

create table joined as select a.*, b.variable as var1, c.variable as vari2.......

inner join b

on a.match=b.match

inner join c

on a.match=c.match

etc. Any faster way?

5 REPLIES 5
Reeza
Super User

150 records shouldn't be long. How big are tables B-O?

If you're only taking a single variable from each, look at using a format or hash approach instead.

LinusH
Tourmaline | Level 20

Indexing on the match columns might help.

Data never sleeps
Vince28_Statcan
Quartz | Level 8

How large (number of variables and approx total length of vars) and tall (num of records) are tables B-O? If they are somewhat small, this could be done easily and very fast with hash tables. If they are large, running 14 large hash tables will probably end in an insufficient memory error.

As a general statement though, I would think the reason why it is so slow is the lack of nesting in your query. At least I would suspect that chains of joins are processed the same way, regardless that they are inner or outer and thus there is probably an attempt to inner join the B-O tables with each other prior to eventually doing it with the very small 150 records initial set. I'm not a SQL expert and thus I don't know if there is a maximum nesting level necessary but here's how I would do it keeping your current idea:

proc sql;

     create table joined as

     select o.variable as var14, temp.*

     from o inner join (select n.variable as var13, temp.*

                              from n inner join (select m.variable as var12, temp*

                                       ...more nesting...

                           

                                                

                                                  from b inner join a

                                                  on a.match=b.match

                                                  ) as temp

                              on temp.match=c.match) as temp

                        on temp.match=d.match) as temp

etc.

quit;

Hash tables would definitely be great for your purpose though

data joined;

     length match $8.;

     length variable1 $20 /*...*/ variable14 $20;

     if _N_=1 then do;

          /* Declare B-O hashes */

          declare hash hash_o(dataset:"o(rename=(variable=variable14))");

          hash_o.DefineKey("match");

          hash_o.DefineData("variable14"); /* or its renamed name */

          hash_o.DefineDone;

          /* ... more hash declaration ...*/

          declare hash hash_b(dataset:"b(rename=(variable=variable1))");

          hash_b.DefineKey("match");

          hash_b.DefineData("variable1"); /* or its renamed name */

          hash_b.DefineDone;

     end; /* B-O hash tables are declared */

     set a;

     rc=0; /* initialize error code */

     rc=max(hash_b.find(), rc);

     rc=max(hash_c.find(), rc);

     /*...*/

     rc=max(hash_o.find(), rc);

     if rc=0 then output; /*this is to handle the inner join; if any of the key was not found in one of the tables, one of the RCs will have been > 0 and we wouldn't want to output that record with some missing values */

run;

I can take roughly 1M records with give or take 100-200 total length of variables on my machine at work so if the total of tables B through O is below that, odds are you will have no problem whatsoever with the hash approach and will feel like it is absurd fast AND easier to program than nesting joins;

Hope this helps!

Vincent

PGStats
Opal | Level 21

I do this kind of join all the time with SAS and it is very fast. You are joining 15 tables, the max is 32 so no problem there. Unless you really want to eliminate all records from A that do not have a match in every other table, it would be better to replace the inner joins with left joins.

PG

PG
Peter_C
Rhodochrosite | Level 12

If 15 joi s are a problem for one query, break the task  into (upto) 15 separate queries

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 5 replies
  • 4785 views
  • 1 like
  • 6 in conversation