I'm just curious to get some more info about the differences between proc sql and the data step. Actually, I'm building a DWH, and I'm importing a lot of tables (some small, some quite huge). The data step seemed to me the easiest way, with less code to write, the much easier drop/keep syntax, and so on...
however, this is syntax, not performance: I've been trying to test if proc sql or data step is faster in loading a table from SQL 2000 or Oracle, but I've haven't been able yet to get it (on small tables, it seems that proc sql is faster).
Any idea? Any KB article interesting about this?
The most time consuming part will be I/O. It also depends if you load the whole table or only a part into SAS. If it's only a part of the table then make sure that the selection is done on the SQL side - and do not download the data first to SAS and then select. - The options I've sent you will tell you if SAS is handling things.
It often helps to avoid SAS specific functions for data selection - as SAS often can't resolve such functions into SQL statements (and therefore selects only later).
P.S: I know that there are quite a bit of SAS papers about performance. Just too lazy right now to do this search job for you ;-)
But this was exactly what I was talking about: SAS generates some SQL code and sends this SQL to the DBMS - it just depends on how the clauses in the SAS code are done for SAS to be able to generate efficient SQL code sent to the DBMS.
Pass-Through SQL is of course always a way to send exactly what one wants - but it also means that one has to get into the different SQL flavors.
I find it often much more efficient (from a programming point of view) to use PROC SQL with a sas libname and then to control what kind of SQL SAS sends to the DBMS (looking at the log using the options I've posted).
And another reason not to use pass-through SQL: If the DBMS changes all pass-through SQL code has to be changed. If the technique was using SAS libnames then only these libname definitions have to be changed.
Well: Just an opinion. I actually used already a lot of pass-through SQL. I just think it's always a second choice.
Patrick: how about any opinions about migrating data in the other direction, i.e., from SAS to SQL Server? I've found libname to be painfully slow...and have sort of concluded that I'm best off using a SQL Server utility such as bulk copy. What do you think?
As Linus mentiones: There are bulk-load options doing the job.
The only problem I've seen so far with bulk-loading to Oracle was that if a bulk-load fails Oracle doesn't always send a proper error code back to SAS - what results in a SAS job ending without errors but data not loaded...
The work-around we had to implement was a little macro which analyses the bulk-load log (searching for ora-errors in the log) and aborting the SAS job if an error message is found.