DATA Step, Macro, Functions and more

Joining Performance ( Join separately then Union join vs Union Join then join tables)

Accepted Solution Solved
Reply
Contributor
Posts: 37
Accepted Solution

Joining Performance ( Join separately then Union join vs Union Join then join tables)

[ Edited ]

Hi Gurus,

 

In oracle and sqlserver environment, after extract tables with where (FACT Table filter out unwant data) as SAS7DAT in Work folder 

 

then If i have a 8 SAS7dat,  A B C D E F G H , A = E ( Fact Table , same columns and data type , only different is data)

                                                              B = F (Ref table, same columns and data type , only different is data)

                                                              C = G (Ref table, same columns and data type , only different is data)

                                                              D = H (Ref table, same columns and data type , only different is data)

 

Would you rather   

(i) 

                             A join B join C join D 

                               E join F join G join H   first,

                              then union join both dataset                       or

(ii)

                            A Union E,  B Union F , C union G, D union H then

                            AE Join BF join CG join DH

 

when considering performance issue 

  

 

 


Accepted Solutions
Solution
‎02-17-2017 05:54 AM
Trusted Advisor
Posts: 1,401

Re: Joining Performance ( Join separately then Union join vs Union Join then join tables)

If originally you have 4 tables, why did you cut them into 2 halves ?

If you can import each of them without splitting them you will save more I/O.

 

When you join any two tables you read and write same amount of data, that is 2*(x+y);

When you append  table X to table Y you read and write X only, that is 2*x;

 

So, to answer your question - YES, I beleive it is right.

View solution in original post


All Replies
Trusted Advisor
Posts: 1,401

Re: Joining Performance ( Join separately then Union join vs Union Join then join tables)

1) You may find interest in the next link:

http://www2.sas.com/proceedings/sugi29/064-29.pdf

 

2) Is it too heavy to run both methods and compare CPU time and elapse time ?

 

3) Do you need to join tables by key (ID) or just append data and merge variables  ?

     If possible then first append data of same format using proc append then 

     use either sort + merge or use sql to join them.

     Better append the smaller table to the big one of same format, in order to save I/O,

     unless you need keep the original 8 tables as they are.

     

Contributor
Posts: 37

Re: Joining Performance ( Join separately then Union join vs Union Join then join tables)

[ Edited ]

Hi Shmuel,

 

Yes, in other word, i have 4 tables but i cut down half data in each table and create another tables, so 8 tables in total. so the columns and data type are identical  as below A = E, B =F, C=G and D=H

 

A B C D

 

E F G H

 

 

So you suggest to append AE , BF CH and DH first then join them together Using Key  later on  becase of IO reading time ?

 

Thanks

Solution
‎02-17-2017 05:54 AM
Trusted Advisor
Posts: 1,401

Re: Joining Performance ( Join separately then Union join vs Union Join then join tables)

If originally you have 4 tables, why did you cut them into 2 halves ?

If you can import each of them without splitting them you will save more I/O.

 

When you join any two tables you read and write same amount of data, that is 2*(x+y);

When you append  table X to table Y you read and write X only, that is 2*x;

 

So, to answer your question - YES, I beleive it is right.

Super Contributor
Posts: 474

Re: Joining Performance ( Join separately then Union join vs Union Join then join tables)

 

When it comes to performance of merging data it cannot be reduced just to the sequence of joins.

 

If that's just what you are after, then the answer is pretty much: set the fullstimer option, run both several time and see what's best.

 

If you really want to go deep in this, then why not consolidate all the data inside the database?

 

Data is stored and retrieved in a much more efficient way inside the Database.

 

If for some reason that's not possible, then please describe:

Each join type and keys, size (obs, record lenght) of each dataset.

 

Daniel Santos @ www.cgd.pt

☑ This topic is solved.

Need further help from the community? Please ask a new question.

Discussion stats
  • 4 replies
  • 152 views
  • 0 likes
  • 3 in conversation