02-08-2017 05:38 PM
I do the majority of my work in Oracle using Explicit Passthrough SQL and then use SAS to process and output the results. I've noticed however that simply transferring data between the Oracle and SAS Servers often takes longer than processing in both Oracle and SAS combined and I'd like to speed this up if possible. I've been playing around with some of the parameters on the SAS Access to Oracle, primarily the READBUFF, but the results seem to be all over the place. I created the macro below to test transfering files of various sizes across with different parameter values. I replicated each transfer a number to times to try and avoid transient effects from other peoples running code and ran the code at non-peak times. The tables are all pre-existing on the Oracle server and have statistics collected on them.
As an example, the test file 7 which is around 4GB in Oracle and around 7.5GB when in a SAS Dataset is taking about 30 minutes to transfer across using the code below, but when I take the resultant SAS dataset and transfer it back to the Oracle server from a terminal using SCP it only takes around 2 minutes. I know there will be some overhead since it's also translating the data from Oracle to SAS, but I wouldn't think it should take 15 times longer. Modifying the READBUFF for this transfer basically has no effect on transfer time. Conversely, for Files 1 and 2, increasing the READBUFF to 1000 or higher reliably decreases the transfer time by 25% - 50%, although the decrease doesn't appear to be linearly related to the READBUFF. It's a 50% reduction at READBUFF = 1500 or 7000, with low std dev across those runs.
Any suggestions or thoughts are welcome on how to speed this up or on other parameters to adjust!
LIBNAME lib_OUT "/saswork/mx/temp/"; /* OracTST.File_1 -- 6 Columns - 13.7M Rows - 1.25 GB */ /* OracTST.File_2 -- 6 Columns - 24.5M Rows - 2.15 GB */ /* OracTST.File_3 -- 81 Columns - 10.4M Rows - 15.36 GB */ /* OracTST.File_4 -- 93 Columns - 12.6M Rows - 19.17 GB */ /* OracTST.File_5 -- 51 Columns - 16.0M Rows - 6.56 GB */ /* OracTST.File_6 -- 38 Columns - 48.1M Rows - 7.59 GB */ /* OracTST.File_7 -- 3 Columns - 71.9M Rows - 4.12 GB */ %MACRO Export_Loop; %DO var_Replicate = 1 %TO 4; %DO var_BuffCnt = 0 %TO 15; %IF &var_BuffCnt = 0 %THEN %LET var_ReadBuff = ; %ELSE %LET var_ReadBuff = %BQUOTE(READBUFF=)%EVAL(500 * &var_BuffCnt); %DO var_TestFileCnt = 1 %TO 7; %IF &var_TestFileCnt = 1 %THEN %LET var_TestFile = OracTST.File_1; %IF &var_TestFileCnt = 2 %THEN %LET var_TestFile = OracTST.File_2; %IF &var_TestFileCnt = 3 %THEN %LET var_TestFile = OracTST.File_3; %IF &var_TestFileCnt = 4 %THEN %LET var_TestFile = OracTST.File_4; %IF &var_TestFileCnt = 5 %THEN %LET var_TestFile = OracTST.File_5; %IF &var_TestFileCnt = 6 %THEN %LET var_TestFile = OracTST.File_6; %IF &var_TestFileCnt = 7 %THEN %LET var_TestFile = OracTST.File_7; LIBNAME OracTST ORACLE USER=&UserNm PASS=&PassWd PATH='SASDB02' CONNECTION=GLOBAL SCHEMA=&UserNm &var_ReadBuff; PROC SQL; CREATE TABLE lib_OUT.Temp_Out AS SELECT * FROM &var_TestFile.(ORHINTS='/*+ PARALLEL(16) */'); %PUT %SYSFUNC(COMPBL("Exporting Test File - &var_TestFile" --- "&var_ReadBuff")); ;QUIT; %dataset_delete(lib_OUT.Temp_Out); LIBNAME OracTST CLEAR; %END; %END; %END; %MEND; %Export_Loop;
Currently running on 9.4 M2, soon to be M4.
02-08-2017 08:53 PM
It's impossible to give a hard-and-fast answer to this, because there are so many environmental things to consider. But try this for a start:
LIBNAME lib_OUT "/saswork/mx/temp/"; %MACRO Export_Loop; %DO var_Replicate = 1 %TO 4; %DO var_BuffCnt = 0 %TO 15; %IF &var_BuffCnt = 0 %THEN %LET var_ReadBuff = ; %ELSE %LET var_ReadBuff = %BQUOTE(READBUFF=)%EVAL(500 * &var_BuffCnt); %DO i = 1 %TO 7; %let var_Testfile = OracTST.file_&i; LIBNAME OracTST ORACLE USER=&UserNm PASS=&PassWd PATH='SASDB02' CONNECTION=GLOBAL SCHEMA=&UserNm &var_ReadBuff; PROC SQL; CREATE TABLE lib_OUT.Temp_Out(compress=binary) AS SELECT * FROM &var_TestFile.(ORHINTS='/*+ PARALLEL(16) */'); %PUT %SYSFUNC(COMPBL("Exporting Test File - &var_TestFile" --- "&var_ReadBuff")); ;QUIT; %dataset_delete(lib_OUT.Temp_Out); LIBNAME OracTST CLEAR; %END; %END; %END; %MEND; %Export_Loop;
Note I've simplified the assignment of var_testfile - not important, but easier to read.
The difference is in placing compression on the SAS dataset. Although all of the data will be brought over the pass-through, the writes into the SAS dataset will be much reduced. I use binary instead of char because it seems to be more efficient on very wide tables.
SAS's compression isn't the best, but it will help a little.
What would really help would be, were it possible, reducing the columns, especially very wide varchar types, to the bare minumum of what you need.
02-09-2017 12:04 PM
Thanks LaurieF, I was actually already using compression using the CHAR type specified in my SAS autoexec file, but I'll try it out with BINARY to see if that changes anything.
Unfortunately reducing the columns isn't feasible in most of my use cases. The primary use case I have in mind (or at least the one that got me started on this path) is test file #7, which only has 3 columns to start with and I can't incrementally restrict the length of any of the columns since the whole point of the file is that it contains frequency tables for various VARCHAR columns of another table so we can see what the distribution of values are for the different column. (Column 1 = Other tables column name, Column 2 = Values found in that column, Column 3 = Number of times that value shows up in the table.)
One thing I've been considering, but haven't tested yet is to see if different methods of transferring the data would make a difference. For example is using PROC SQL with implicit pass-through faster/slower than doing the same thing with explicit pass-through or using a DATA step. My gut feeling is that this wouldn't make a difference, but it's probably worth confirming.
02-09-2017 02:50 PM
The binary compression on the narrow table could actually make things worse, but probably only by a small amount - the data coming across the pipe won't be affected.
It's interesting that going the other way is so quick - I'm no Oracle maven, but I assume it's an optimised bulk-load, but nevertheless the same amount of data is being pushed. I'd be surprised if the Oracle library engine is doing anything that clever. But I could well be wrong.
I'd be surprised if any other SAS method to pull the data would make any difference - it's still going through the same gateway, after all.