SAS Data Integration Studio, DataFlux Data Management Studio, SAS/ACCESS, SAS Data Loader for Hadoop and others

Optimize data download from Oracle to SAS table

Reply
N/A
Posts: 1

Optimize data download from Oracle to SAS table

Hi,

I use SAS/Access to Oracle to connect SAS to Oracle.  I need to download Oracle data to SAS table.  May I know which is the best method in term of speed:-

PROC COPY

PROC SQL

PROC DBLOAD

DATA step

I have about 5mil records. 

Thanks.

Janice

Super User
Posts: 9,687

Re: Optimize data download from Oracle to SAS table

I will pick PROC DBLOAD.

But the speed of ETL is not about these PROC , it is about options in LIBNAMES , like you can add readbuff=10000 and bulkload=yes to fast it .

And Special System option:   options compress=yes to decrease I/O overhead .

Xia Keshan

Super User
Posts: 5,260

Re: Optimize data download from Oracle to SAS table

Downloading data without transformation doesn't add value.

I would chose SQL so that any transformations is pushed down prior to the download.

Data never sleeps
SAS Employee
Posts: 203

Re: Optimize data download from Oracle to SAS table

Hi Janice,

 

The SAS procedures (COPY, SQL, APPEND, etc.), and DATA step, will use the same mechanism to bring data from Oracle into SAS. SAS generates a query and ships it off to the database. WHERE clauses and many function calls will  pass through to the database without you having to do anything special.  There is an exception to this - SQL pass-through (aka. explicit pass-though). In this case you create database specific queries and SAS ships it, as -is, to the database. 

 

You may want to take a look at the READBUFF= option. It may help with your query performance. Recent versions of SAS do a good job of setting this for you.

 

If your table is partitioned, you may want to look at the DBSLICE= option. To be honest, if your table is large enough to require partitioning, reading it into a SAS table is probably not a great idea.

 

Here is an explicit pass-thru example. I include it because the CREATE TABLE AS can be difficult to stumble onto.

 

PROC SQL;

   CONNECT TO ORACLE(PATH=mypath USER=myuser PASSWORD=mypw);

   CREATE TABLE mylib.mytable FROM CONNECTION TO ORACLE

        (some Oracle specific SQL statement);

QUIT;

 

The DBLOAD procedure is used to load data from SAS into Oracle. DBLOAD isn't used much for new code since you can run bulkload using dataset and LIBNAME statement options. It's more of a legacy thing.

 

You can see the SQL being sent to the database using this systems option.

OPTION SASTRACE=',,,d' SASTRACELOC=saslog NOSTSUFFIX;

 

Ideally you want to ensure that joins, WHERE clauses and functions (mainly when used in a WHERE clause) are passed to the database.

 

Linus brings up a very good point. The real question is, why do you want to download an entire Oracle table into SAS? Typically, it isn't a great idea. It is best to let the database handle much of the work. There are situations where downloading a subset of data is appropriate. For example: I need to repeatedly process a subset of data and it is quicker, in the long run, to pull the data and then repeatedly process it. 

 

You may find this SGF paper useful. It doesn't address your specific question, but it will help you learn about SQL query performance. 

The SQL Tuning Checklist: Making Slow Database Queries a Thing of the Past

 

Good luck!

Ask a Question
Discussion stats
  • 3 replies
  • 488 views
  • 1 like
  • 4 in conversation