BookmarkSubscribeRSS Feed
DanieleTiles
Calcite | Level 5
Hi,
I'm just curious to get some more info about the differences between proc sql and the data step. Actually, I'm building a DWH, and I'm importing a lot of tables (some small, some quite huge). The data step seemed to me the easiest way, with less code to write, the much easier drop/keep syntax, and so on...
however, this is syntax, not performance: I've been trying to test if proc sql or data step is faster in loading a table from SQL 2000 or Oracle, but I've haven't been able yet to get it (on small tables, it seems that proc sql is faster).
Any idea? Any KB article interesting about this?
Thanks

Daniele
9 REPLIES 9
Patrick
Opal | Level 21
Hi Daniele
In the end-of-the-day also the data step approach results in some SQL code sent to the SQL server.
To see what code is sent to SQL and executed there the following options can be helpful:

options sastrace=',,,ds' sastraceloc=saslog nostsuffix;

The most time consuming part will be I/O. It also depends if you load the whole table or only a part into SAS. If it's only a part of the table then make sure that the selection is done on the SQL side - and do not download the data first to SAS and then select. - The options I've sent you will tell you if SAS is handling things.

It often helps to avoid SAS specific functions for data selection - as SAS often can't resolve such functions into SQL statements (and therefore selects only later).

HTH
Patrick

P.S: I know that there are quite a bit of SAS papers about performance. Just too lazy right now to do this search job for you 😉
GertNissen
Barite | Level 11
If the data step and proc sql are using the same libname (ODBC ?) you will not see a big difference.

But try using Pass Through SQL and you will typically see a big perfomance improvement over ODBC.

Have a look at this paper -> The LIBNAME Engine Compared to SQL Pass-Through (http://nesug.org/Proceedings/nesug07/po/po04.pdf) or search for "Pass Through SQL"
Patrick
Opal | Level 21
Hi Geniz

But this was exactly what I was talking about: SAS generates some SQL code and sends this SQL to the DBMS - it just depends on how the clauses in the SAS code are done for SAS to be able to generate efficient SQL code sent to the DBMS.

Pass-Through SQL is of course always a way to send exactly what one wants - but it also means that one has to get into the different SQL flavors.
I find it often much more efficient (from a programming point of view) to use PROC SQL with a sas libname and then to control what kind of SQL SAS sends to the DBMS (looking at the log using the options I've posted).
And another reason not to use pass-through SQL: If the DBMS changes all pass-through SQL code has to be changed. If the technique was using SAS libnames then only these libname definitions have to be changed.

Well: Just an opinion. I actually used already a lot of pass-through SQL. I just think it's always a second choice.

Cheers, Patrick
deleted_user
Not applicable
Patrick: how about any opinions about migrating data in the other direction, i.e., from SAS to SQL Server? I've found libname to be painfully slow...and have sort of concluded that I'm best off using a SQL Server utility such as bulk copy. What do you think?
deleted_user
Not applicable
Did you ever try the INSERTBUFF= option?
LinusH
Tourmaline | Level 20
The Bulk Copy facility is available with LIBNAME using the BULKLOAD=YES option.

/Linus
Data never sleeps
deleted_user
Not applicable
Thanks much; I'll check these out
Patrick
Opal | Level 21
As Linus mentiones: There are bulk-load options doing the job.

The only problem I've seen so far with bulk-loading to Oracle was that if a bulk-load fails Oracle doesn't always send a proper error code back to SAS - what results in a SAS job ending without errors but data not loaded...

The work-around we had to implement was a little macro which analyses the bulk-load log (searching for ora-errors in the log) and aborting the SAS job if an error message is found.

HTH
Patrick
LinusH
Tourmaline | Level 20
Another way of doing that was to build a wrapper script that replaced the default script that called the Oracle bulk-loader, and handle the RC problem in that script.

But for 9.1.3 there is hot-fix for this particular problem, and it seems definitely solved in 9.2:

http://support.sas.com/kb/13/759.html

I haven't seen any similar problem with ODBC/SQL Server.

/Linus
Data never sleeps

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 9 replies
  • 877 views
  • 0 likes
  • 5 in conversation