DATA Step, Macro, Functions and more

From DB to dataset...what is faster?

Reply
Contributor
Posts: 58

From DB to dataset...what is faster?

Hi,
I'm just curious to get some more info about the differences between proc sql and the data step. Actually, I'm building a DWH, and I'm importing a lot of tables (some small, some quite huge). The data step seemed to me the easiest way, with less code to write, the much easier drop/keep syntax, and so on...
however, this is syntax, not performance: I've been trying to test if proc sql or data step is faster in loading a table from SQL 2000 or Oracle, but I've haven't been able yet to get it (on small tables, it seems that proc sql is faster).
Any idea? Any KB article interesting about this?
Thanks

Daniele
Respected Advisor
Posts: 3,887

Re: From DB to dataset...what is faster?

Hi Daniele
In the end-of-the-day also the data step approach results in some SQL code sent to the SQL server.
To see what code is sent to SQL and executed there the following options can be helpful:

options sastrace=',,,ds' sastraceloc=saslog nostsuffix;

The most time consuming part will be I/O. It also depends if you load the whole table or only a part into SAS. If it's only a part of the table then make sure that the selection is done on the SQL side - and do not download the data first to SAS and then select. - The options I've sent you will tell you if SAS is handling things.

It often helps to avoid SAS specific functions for data selection - as SAS often can't resolve such functions into SQL statements (and therefore selects only later).

HTH
Patrick

P.S: I know that there are quite a bit of SAS papers about performance. Just too lazy right now to do this search job for you ;-)
SAS Employee
Posts: 160

Re: From DB to dataset...what is faster?

If the data step and proc sql are using the same libname (ODBC ?) you will not see a big difference.

But try using Pass Through SQL and you will typically see a big perfomance improvement over ODBC.

Have a look at this paper -> The LIBNAME Engine Compared to SQL Pass-Through (http://nesug.org/Proceedings/nesug07/po/po04.pdf) or search for "Pass Through SQL"
Respected Advisor
Posts: 3,887

Re: From DB to dataset...what is faster?

Hi Geniz

But this was exactly what I was talking about: SAS generates some SQL code and sends this SQL to the DBMS - it just depends on how the clauses in the SAS code are done for SAS to be able to generate efficient SQL code sent to the DBMS.

Pass-Through SQL is of course always a way to send exactly what one wants - but it also means that one has to get into the different SQL flavors.
I find it often much more efficient (from a programming point of view) to use PROC SQL with a sas libname and then to control what kind of SQL SAS sends to the DBMS (looking at the log using the options I've posted).
And another reason not to use pass-through SQL: If the DBMS changes all pass-through SQL code has to be changed. If the technique was using SAS libnames then only these libname definitions have to be changed.

Well: Just an opinion. I actually used already a lot of pass-through SQL. I just think it's always a second choice.

Cheers, Patrick
N/A
Posts: 0

Re: From DB to dataset...what is faster?

Patrick: how about any opinions about migrating data in the other direction, i.e., from SAS to SQL Server? I've found libname to be painfully slow...and have sort of concluded that I'm best off using a SQL Server utility such as bulk copy. What do you think?
N/A
Posts: 0

Re: From DB to dataset...what is faster?

Did you ever try the INSERTBUFF= option?
Super User
Posts: 5,255

Re: From DB to dataset...what is faster?

The Bulk Copy facility is available with LIBNAME using the BULKLOAD=YES option.

/Linus
Data never sleeps
N/A
Posts: 0

Re: From DB to dataset...what is faster?

Thanks much; I'll check these out
Respected Advisor
Posts: 3,887

Re: From DB to dataset...what is faster?

As Linus mentiones: There are bulk-load options doing the job.

The only problem I've seen so far with bulk-loading to Oracle was that if a bulk-load fails Oracle doesn't always send a proper error code back to SAS - what results in a SAS job ending without errors but data not loaded...

The work-around we had to implement was a little macro which analyses the bulk-load log (searching for ora-errors in the log) and aborting the SAS job if an error message is found.

HTH
Patrick
Super User
Posts: 5,255

Re: From DB to dataset...what is faster?

Another way of doing that was to build a wrapper script that replaced the default script that called the Oracle bulk-loader, and handle the RC problem in that script.

But for 9.1.3 there is hot-fix for this particular problem, and it seems definitely solved in 9.2:

http://support.sas.com/kb/13/759.html

I haven't seen any similar problem with ODBC/SQL Server.

/Linus
Data never sleeps
Ask a Question
Discussion stats
  • 9 replies
  • 213 views
  • 0 likes
  • 5 in conversation