11-18-2015 03:57 PM
I have the following concern regarding SAS/ACCESS facility.
Let's imagine that we have an external DB (i.e. Oracle), which we have assigned to a certain libname.
Next, we do a simple operation on one of the tables within this DB, i.e.
data db.table_new; set db.table_old(keep=var1 var2 var3); if var1>0 then new_var1=5; run;
My questions are the following:
table_oldbe pulled from external DB to SAS Server in order to process the data?
The reference is unclear about it [link]. See page 62.
In order to improve performance of data processing is it unavoidable to rewrite data steps to SQLs which can be easily passed to DBMS directly?
11-18-2015 05:27 PM
1. With this particular code snippet, yes. Except for columns other than var1-var3.
2. As you could conclude from 1, no.
SQL has a higher chance of being submitted to the RDBMS (pass thru), but that's not the whole truth.
Where statements can be passed thru from data steps and procedures as well.
Sorting from proc sort will be passed thru.
Aggregating from various statistical procedures will often be passed thru.
Quite difficult to understand what you refer to when you are linking to a whole manual.
This would be a good starting point:
11-19-2015 04:18 AM
Thank you for quick reply.
(1) Therefore, in order to avoid performance issues caused by transferring the data from DB to SAS and vice versa, the only solution is to rewrite data steps to SQL, correct?
(2) You have said that where statements can be passed through, as well as other operations. Do you know where can I find the complete list of them?
11-19-2015 10:43 AM - edited 11-19-2015 10:46 AM
1. No. If the query is simple SAS passes it to the DB already, rewriting it in SQL will not gain any efficiencies. You can also write SQL using SQL pass through that will be in the SQL that is native to your DB and all of that is passed directly to your DB.
If you're using a SAS table, locally, and have data on the DB that you merge or work with then the full table is pulled down to SAS.
So it isn't a straight yes/no answer.
The list of functions that are passed directly to the server is DB specific and is found in the documentation.
11-19-2015 11:04 AM
Ok, from the top of my head:
- BY will have an implict ORDER BY done by the RDBMS
- WHERE, most of them, will be pass-thru
- KEPP/DROP data set options (on the input data ste/table) wil lbe passed to RDBMS as narrowed Select clauses
WHen it comes to SAS SQL, case statements will be passed - so here is a diferentiater, data step IFs won't.
Stuff that will be passed thru (regardless of SAS tool):
- BY (including any SQL ORDER and GROUP)
- Plain SQL
- Aggegations (SAS statistical procedures, SQL)
Let's return your initial question, about rewriting data steps.
It depends what kind of processing you do.
Data steps does usually not do any aggregations, and filtering will take part in RDBMS, if you (god forbid), uses subsetting IFs.
Be sure to have any drop/keep as ds-options for your source table, if possible.
Following these guidelines, you should probably not need to transform to SQL, at least for performance reasons.
11-19-2015 03:41 PM
Your friend is are the following options which show you in the SAS log for data steps and SAS SQL which part of the code could be sent to the database for processing: options sastrace=',,,d' sastraceloc=saslog nostsuffix;
When it comes to performance interacting with a database the following is important:
- reduce volumes as early as possible
- minimize data movements between SAS and the database
The best control for code being processed in-database is using pass-through SQL. Using pass-through SQL also allows to use database specific syntax (eg. analytic functions in Oracle).
SAS SQL can only get fully passed to the database if you're only using SAS functions which can get converted into database functions. The list of these functions is fully documented for each SAS/Access module and version.
Except for pulling data into SAS only subsetting rows (where clause) and columns I persanally avoid using the datastep when interacting with a database. It's just not transparent enough for my taste. I'd rather formulate a view in SQL and then use this view as source in a datastep.