Hello,
I wanted to extract the big data from the oracle database.
Using a proc sql + libname and with the compress = yes option, I'm doing better than the path through proc sql.
Do you have an explanation
Thank you
Not without logs from both scenarios, with appropriate logging options set:
options msglevel=i sastrace=',,,d' sastraceloc=saslog nostsuffix fullstimer;
/*Resultats whith sql libname and compress*/
NOTE: Compressing data set WORK.TABLE increased size by 3.23 percent.
Compressed is 138642 pages; un-compressed would require 134304 pages.
NOTE: Table WORK.data created, with 156598086 rows and 6 columns.
92 quit;
NOTE: PROCEDURE SQL used (Total process time):
real time 9:02.91
user cpu time 3:53.22
system cpu time 1:05.57
memory 5422.90k
OS Memory 16828.00k
Timestamp 12/11/2018 03:12:37 PM
Step Count 6 Switch Count 613883
Page Faults 0
Page Reclaims 699
Page Swaps 0
Voluntary Context Switches 3364703
Involuntary Context Switches 1010209
Block Input Operations 0
Block Output Operations 17757080
/* Result with path through */
NOTE: Compressing data set WORK.data increased size by 3.23 percent.
Compressed is 138643 pages; un-compressed would require 134304 pages.
NOTE: Table WORK.data created, with 156598085 rows and 6 columns.
113! disconnect from oracle;
114
115 quit;
NOTE: PROCEDURE SQL used (Total process time):
real time 11:41.58
user cpu time 3:55.16
system cpu time 1:06.44
memory 5321.28k
OS Memory 17340.00k
Timestamp 12/11/2018 03:25:04 PM
Step Count 7 Switch Count 607470
Page Faults 0
Page Reclaims 130
Page Swaps 0
Voluntary Context Switches 3349686
Involuntary Context Switches 1004741
Block Input Operations 192
Block Output Operations 17764736
I use this code.
The result of proc sql with compress =yes is better then path through
rsubmit ;
options NODBIDIRECTEXEC;
proc sql;
connect to oracle
(PATH="***" USER="****" PASSWORD="***********" preserve_comments );
create table data as select *
from connection to oracle
(
select distinct col1,
col2
from schema.table
where filter1
and filter2
)
;disconnect from oracle;
quit;
endrsubmit ;
rsubmit ;
options compress=yes ;
proc sql;
create table data as select col1,
col2
from schema.table
where filter1
and filter2;
quit;
endrsubmit ;
Check the definition of the libref SCHEMA. Perhaps you are using different options than you did in your CONNECT statement.
In terms of performance of extracts it is usually the speed of the transfer from the remote database to the SAS server that is most important. So perhaps you have options on your LIBNAME statement that enable SAS to use some bullk or fast transfer procedure.
Another issue is that there might be some caching going on in your database. So if you want the explicit query first and then implicit query pulled the same data again the database might respond quicker since the data was already in its cache. Make to alternate the order of your queries in any performance tests to try to avoid giving an advantage to the method run first.
The dataset option COMPRESS will not have any impact on the transfer speed, but depending on whether it actually results in fewer disk blocks being written it might improve the overall speed of a operations using that dataset.
without the distinct, runtimes are the same between the two methods
I'm adding options like buffsize, bufno. but for now no big change.
Could you give me some tips for using these options.
That's what I'm trying to do.
thank you
rsubmit ;
options NODBIDIRECTEXEC;
proc sql;
connect to oracle
(PATH="" USER="" PASSWORD="" preserve_comments buffsize=500000 );
create table enc_6po( bufsize=500000 BUFNO=100 compress=yes) as
select col1 , col2
from connection to oracle
(
select col1,col2
from schema.data
where filter1 and filter2
)
;disconnect from oracle;
quit;
endrsubmit ;
The answer to original question, the difference in speed, was different queries. So now I’m assuming your question is how to speed this process up?
It looks like a straight copy table with small modifications.
Have you tried using a libname and PROC COPY instead to move the table over. That may be faster.
Or PROC DOWNLOAD.
You can add your WHERE using the dataset options.
@mansour_ib_sas wrote:
without the distinct, runtimes are the same between the two methods
I'm adding options like buffsize, bufno. but for now no big change.
Could you give me some tips for using these options.
That's what I'm trying to do.
thank you
rsubmit ; options NODBIDIRECTEXEC; proc sql; connect to oracle (PATH="" USER="" PASSWORD="" preserve_comments buffsize=500000 ); create table enc_6po( bufsize=500000 BUFNO=100 compress=yes) as select col1 , col2 from connection to oracle ( select col1,col2 from schema.data where filter1 and filter2 ) ;disconnect from oracle; quit; endrsubmit ;
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.