Hi Gurus,
In oracle and sqlserver environment, after extract tables with where (FACT Table filter out unwant data) as SAS7DAT in Work folder
then If i have a 8 SAS7dat, A B C D E F G H , A = E ( Fact Table , same columns and data type , only different is data)
B = F (Ref table, same columns and data type , only different is data)
C = G (Ref table, same columns and data type , only different is data)
D = H (Ref table, same columns and data type , only different is data)
Would you rather
(i)
A join B join C join D
E join F join G join H first,
then union join both dataset or
(ii)
A Union E, B Union F , C union G, D union H then
AE Join BF join CG join DH
when considering performance issue
If originally you have 4 tables, why did you cut them into 2 halves ?
If you can import each of them without splitting them you will save more I/O.
When you join any two tables you read and write same amount of data, that is 2*(x+y);
When you append table X to table Y you read and write X only, that is 2*x;
So, to answer your question - YES, I beleive it is right.
1) You may find interest in the next link:
http://www2.sas.com/proceedings/sugi29/064-29.pdf
2) Is it too heavy to run both methods and compare CPU time and elapse time ?
3) Do you need to join tables by key (ID) or just append data and merge variables ?
If possible then first append data of same format using proc append then
use either sort + merge or use sql to join them.
Better append the smaller table to the big one of same format, in order to save I/O,
unless you need keep the original 8 tables as they are.
Hi Shmuel,
Yes, in other word, i have 4 tables but i cut down half data in each table and create another tables, so 8 tables in total. so the columns and data type are identical as below A = E, B =F, C=G and D=H
A B C D
E F G H
So you suggest to append AE , BF CH and DH first then join them together Using Key later on becase of IO reading time ?
Thanks
If originally you have 4 tables, why did you cut them into 2 halves ?
If you can import each of them without splitting them you will save more I/O.
When you join any two tables you read and write same amount of data, that is 2*(x+y);
When you append table X to table Y you read and write X only, that is 2*x;
So, to answer your question - YES, I beleive it is right.
When it comes to performance of merging data it cannot be reduced just to the sequence of joins.
If that's just what you are after, then the answer is pretty much: set the fullstimer option, run both several time and see what's best.
If you really want to go deep in this, then why not consolidate all the data inside the database?
Data is stored and retrieved in a much more efficient way inside the Database.
If for some reason that's not possible, then please describe:
Each join type and keys, size (obs, record lenght) of each dataset.
Daniel Santos @ www.cgd.pt
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.