Compare sql with sql pass through

mansour_ib_sas · Posted 11-12-2018 07:24 AM

Hello,

I wanted to extract the big data from the oracle database.

Using a proc sql + libname and with the compress = yes option, I'm doing better than the path through proc sql.

Do you have an explanation

Thank you

LinusH · Posted 11-12-2018 07:29 AM

Not without logs from both scenarios, with appropriate logging options set:

options msglevel=i sastrace=',,,d' sastraceloc=saslog nostsuffix fullstimer;

Data never sleeps

mansour_ib_sas · Posted 11-12-2018 09:42 AM

/*Resultats whith sql libname and compress*/


NOTE: Compressing data set WORK.TABLE increased size by 3.23 percent.
      Compressed is 138642 pages; un-compressed would require 134304 pages.
NOTE: Table WORK.data created, with 156598086 rows and 6 columns.

92       quit;
NOTE: PROCEDURE SQL used (Total process time):
      real time           9:02.91
      user cpu time       3:53.22
      system cpu time     1:05.57
      memory              5422.90k
      OS Memory           16828.00k
      Timestamp           12/11/2018 03:12:37 PM
      Step Count                        6  Switch Count  613883
      Page Faults                       0
      Page Reclaims                     699
      Page Swaps                        0
      Voluntary Context Switches        3364703
      Involuntary Context Switches      1010209
      Block Input Operations            0
      Block Output Operations           17757080


/*     Result with path through  */

NOTE: Compressing data set WORK.data increased size by 3.23 percent.
      Compressed is 138643 pages; un-compressed would require 134304 pages.
NOTE: Table WORK.data created, with 156598085 rows and 6 columns.

113!  disconnect from oracle;
114
115  quit;
NOTE: PROCEDURE SQL used (Total process time):
      real time           11:41.58
      user cpu time       3:55.16
      system cpu time     1:06.44
      memory              5321.28k
      OS Memory           17340.00k
      Timestamp           12/11/2018 03:25:04 PM
      Step Count                        7  Switch Count  607470
      Page Faults                       0
      Page Reclaims                     130
      Page Swaps                        0
      Voluntary Context Switches        3349686
      Involuntary Context Switches      1004741
      Block Input Operations            192
      Block Output Operations           17764736

Reeza · Posted 11-12-2018 01:22 PM

It seems like COMPRESS isn’t useful in this particular example. No code, so still hard to see. My guess, in this example it doesn’t matter which method you use because it’s a bandwidth issue of where the time is spent transferring the data. Both times are close enough that the difference is likely due to other traffic on your network. Just a guess though given no code.

mansour_ib_sas · Posted 11-12-2018 01:35 PM

I use this code.

The result of proc sql with compress =yes is better then path through

rsubmit ;

options NODBIDIRECTEXEC;

proc sql;

connect to oracle

(PATH="***"  USER="****" PASSWORD="***********" preserve_comments ); 

 create table data as select *

from connection to oracle
(

select  distinct         col1,
                         col2

	from schema.table 
	where    filter1
	and filter2

)
;disconnect from oracle; 

quit; 

endrsubmit ;



rsubmit ;

options compress=yes ;
	  
proc sql; 
    create table data as select col1,
                                col2
	from schema.table 
	where filter1
	and filter2;

quit;

endrsubmit ;

Reeza · Posted 11-12-2018 02:02 PM

I’m assuming you copied and simplified because those programs are not the same and comparing different programs is comparing apples and oranges. Of course a program using distinct would take longer than one not using distinct.

Given the simplicity of the queries I still think this is a network/bandwidth issue.

Tom · Posted 11-12-2018 02:22 PM

Check the definition of the libref SCHEMA. Perhaps you are using different options than you did in your CONNECT statement.

In terms of performance of extracts it is usually the speed of the transfer from the remote database to the SAS server that is most important. So perhaps you have options on your LIBNAME statement that enable SAS to use some bullk or fast transfer procedure.

Another issue is that there might be some caching going on in your database. So if you want the explicit query first and then implicit query pulled the same data again the database might respond quicker since the data was already in its cache. Make to alternate the order of your queries in any performance tests to try to avoid giving an advantage to the method run first.

The dataset option COMPRESS will not have any impact on the transfer speed, but depending on whether it actually results in fewer disk blocks being written it might improve the overall speed of a operations using that dataset.

mansour_ib_sas · Posted 11-12-2018 02:49 PM

without the distinct, runtimes are the same between the two methods

I'm adding options like buffsize, bufno. but for now no big change.

Could you give me some tips for using these options.

That's what I'm trying to do.

thank you

rsubmit ;

options NODBIDIRECTEXEC;



 
proc sql;

connect to oracle

(PATH=""  USER="" PASSWORD="" preserve_comments buffsize=500000  ); 

 create table enc_6po( bufsize=500000 BUFNO=100 compress=yes) as 
   select col1 , col2

from connection to oracle
(

select      col1,col2  

	from schema.data 
	where     filter1 and filter2
)
;disconnect from oracle; 

quit; 

endrsubmit ;

Reeza · Posted 11-12-2018 04:05 PM

The answer to original question, the difference in speed, was different queries. So now I’m assuming your question is how to speed this process up?

It looks like a straight copy table with small modifications.

Have you tried using a libname and PROC COPY instead to move the table over. That may be faster.

Or PROC DOWNLOAD.

You can add your WHERE using the dataset options.

@mansour_ib_sas wrote:

without the distinct, runtimes are the same between the two methods

I'm adding options like buffsize, bufno. but for now no big change.

Could you give me some tips for using these options.

That's what I'm trying to do.

thank you
rsubmit ;

options NODBIDIRECTEXEC;



 
proc sql;

connect to oracle

(PATH=""  USER="" PASSWORD="" preserve_comments buffsize=500000  ); 

 create table enc_6po( bufsize=500000 BUFNO=100 compress=yes) as 
   select col1 , col2

from connection to oracle
(

select      col1,col2  

	from schema.data 
	where     filter1 and filter2
)
;disconnect from oracle; 

quit; 

endrsubmit ;

Compare sql with sql pass through

Re: Compare sql with sql pass through

Re: Compare sql with sql pass through

Re: Compare sql with sql pass through

Re: Compare sql with sql pass through

Re: Compare sql with sql pass through

Re: Compare sql with sql pass through

Re: Compare sql with sql pass through

Re: Compare sql with sql pass through

Registration is open

SAS Training: Just a Click Away