- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Hello,
I wanted to extract the big data from the oracle database.
Using a proc sql + libname and with the compress = yes option, I'm doing better than the path through proc sql.
Do you have an explanation
Thank you
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Not without logs from both scenarios, with appropriate logging options set:
options msglevel=i sastrace=',,,d' sastraceloc=saslog nostsuffix fullstimer;
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
/*Resultats whith sql libname and compress*/
NOTE: Compressing data set WORK.TABLE increased size by 3.23 percent.
Compressed is 138642 pages; un-compressed would require 134304 pages.
NOTE: Table WORK.data created, with 156598086 rows and 6 columns.
92 quit;
NOTE: PROCEDURE SQL used (Total process time):
real time 9:02.91
user cpu time 3:53.22
system cpu time 1:05.57
memory 5422.90k
OS Memory 16828.00k
Timestamp 12/11/2018 03:12:37 PM
Step Count 6 Switch Count 613883
Page Faults 0
Page Reclaims 699
Page Swaps 0
Voluntary Context Switches 3364703
Involuntary Context Switches 1010209
Block Input Operations 0
Block Output Operations 17757080
/* Result with path through */
NOTE: Compressing data set WORK.data increased size by 3.23 percent.
Compressed is 138643 pages; un-compressed would require 134304 pages.
NOTE: Table WORK.data created, with 156598085 rows and 6 columns.
113! disconnect from oracle;
114
115 quit;
NOTE: PROCEDURE SQL used (Total process time):
real time 11:41.58
user cpu time 3:55.16
system cpu time 1:06.44
memory 5321.28k
OS Memory 17340.00k
Timestamp 12/11/2018 03:25:04 PM
Step Count 7 Switch Count 607470
Page Faults 0
Page Reclaims 130
Page Swaps 0
Voluntary Context Switches 3349686
Involuntary Context Switches 1004741
Block Input Operations 192
Block Output Operations 17764736
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
I use this code.
The result of proc sql with compress =yes is better then path through
rsubmit ;
options NODBIDIRECTEXEC;
proc sql;
connect to oracle
(PATH="***" USER="****" PASSWORD="***********" preserve_comments );
create table data as select *
from connection to oracle
(
select distinct col1,
col2
from schema.table
where filter1
and filter2
)
;disconnect from oracle;
quit;
endrsubmit ;
rsubmit ;
options compress=yes ;
proc sql;
create table data as select col1,
col2
from schema.table
where filter1
and filter2;
quit;
endrsubmit ;
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Given the simplicity of the queries I still think this is a network/bandwidth issue.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Check the definition of the libref SCHEMA. Perhaps you are using different options than you did in your CONNECT statement.
In terms of performance of extracts it is usually the speed of the transfer from the remote database to the SAS server that is most important. So perhaps you have options on your LIBNAME statement that enable SAS to use some bullk or fast transfer procedure.
Another issue is that there might be some caching going on in your database. So if you want the explicit query first and then implicit query pulled the same data again the database might respond quicker since the data was already in its cache. Make to alternate the order of your queries in any performance tests to try to avoid giving an advantage to the method run first.
The dataset option COMPRESS will not have any impact on the transfer speed, but depending on whether it actually results in fewer disk blocks being written it might improve the overall speed of a operations using that dataset.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
without the distinct, runtimes are the same between the two methods
I'm adding options like buffsize, bufno. but for now no big change.
Could you give me some tips for using these options.
That's what I'm trying to do.
thank you
rsubmit ;
options NODBIDIRECTEXEC;
proc sql;
connect to oracle
(PATH="" USER="" PASSWORD="" preserve_comments buffsize=500000 );
create table enc_6po( bufsize=500000 BUFNO=100 compress=yes) as
select col1 , col2
from connection to oracle
(
select col1,col2
from schema.data
where filter1 and filter2
)
;disconnect from oracle;
quit;
endrsubmit ;
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
The answer to original question, the difference in speed, was different queries. So now I’m assuming your question is how to speed this process up?
It looks like a straight copy table with small modifications.
Have you tried using a libname and PROC COPY instead to move the table over. That may be faster.
Or PROC DOWNLOAD.
You can add your WHERE using the dataset options.
@mansour_ib_sas wrote:
without the distinct, runtimes are the same between the two methods
I'm adding options like buffsize, bufno. but for now no big change.
Could you give me some tips for using these options.
That's what I'm trying to do.
thank you
rsubmit ; options NODBIDIRECTEXEC; proc sql; connect to oracle (PATH="" USER="" PASSWORD="" preserve_comments buffsize=500000 ); create table enc_6po( bufsize=500000 BUFNO=100 compress=yes) as select col1 , col2 from connection to oracle ( select col1,col2 from schema.data where filter1 and filter2 ) ;disconnect from oracle; quit; endrsubmit ;