I have a set of queries that I inherited that each have 8 joins to pull our data. The time to pull the data right now is obviously very long because of the numerous joins and the tables they are pulling from are large as well (anywhere from approx. 40 million records to almost 100 records each)
Besides using my watch to time each run, is there a way that I can more easily determine what would be the more efficient way to run the queries? LIBNAME vs. Pass Through.
If anyone has any ideas from experience that would be appreciated as well.
A bit of dull answer, you will have to find out yourself what will be most efficient, by doing some tests in your particular environment. Some guidelines though:
-Are your inherited queries using libname or SQL pass-thru? If libname, maybe a part of the joins are performed in the source DBMS by using implicit pass thru. To see this, use options sastrace=',,,d' sastraceloc=SASLOG;
-Most documents recommend that joins are to be performed by the source DBMS. But this hasn't to be the best. I've seen cases when doing proc sort - merge steps will dramatically outrun Oracle SQL joins. So, create subsets of your data and test different strategies.
-If you conclude that SQL pass thru will the best, try to get them run via implicit pass thru (libname), this is easiest to code, and your case will be more transparent if your data will move to a different location.
-For steps taking place within SAS, use options FULLSTIMER and PROC SQL _method for help on optimization.