SAS Programming

DATA Step, Macro, Functions and more
BookmarkSubscribeRSS Feed
mansour_ib_sas
Pyrite | Level 9

Hello,

I wanted to extract the big data from the oracle database.

Using a proc  sql + libname and with the compress = yes option, I'm doing better than the path through proc sql.

Do you have an explanation

 

Thank you

8 REPLIES 8
LinusH
Tourmaline | Level 20

Not without logs from both scenarios, with appropriate logging options set:

options msglevel=i sastrace=',,,d' sastraceloc=saslog nostsuffix fullstimer;
Data never sleeps
mansour_ib_sas
Pyrite | Level 9
/*Resultats whith sql libname and compress*/


NOTE: Compressing data set WORK.TABLE increased size by 3.23 percent.
      Compressed is 138642 pages; un-compressed would require 134304 pages.
NOTE: Table WORK.data created, with 156598086 rows and 6 columns.

92       quit;
NOTE: PROCEDURE SQL used (Total process time):
      real time           9:02.91
      user cpu time       3:53.22
      system cpu time     1:05.57
      memory              5422.90k
      OS Memory           16828.00k
      Timestamp           12/11/2018 03:12:37 PM
      Step Count                        6  Switch Count  613883
      Page Faults                       0
      Page Reclaims                     699
      Page Swaps                        0
      Voluntary Context Switches        3364703
      Involuntary Context Switches      1010209
      Block Input Operations            0
      Block Output Operations           17757080


/*     Result with path through  */

NOTE: Compressing data set WORK.data increased size by 3.23 percent.
      Compressed is 138643 pages; un-compressed would require 134304 pages.
NOTE: Table WORK.data created, with 156598085 rows and 6 columns.

113!  disconnect from oracle;
114
115  quit;
NOTE: PROCEDURE SQL used (Total process time):
      real time           11:41.58
      user cpu time       3:55.16
      system cpu time     1:06.44
      memory              5321.28k
      OS Memory           17340.00k
      Timestamp           12/11/2018 03:25:04 PM
      Step Count                        7  Switch Count  607470
      Page Faults                       0
      Page Reclaims                     130
      Page Swaps                        0
      Voluntary Context Switches        3349686
      Involuntary Context Switches      1004741
      Block Input Operations            192
      Block Output Operations           17764736


Reeza
Super User
It seems like COMPRESS isn’t useful in this particular example. No code, so still hard to see. My guess, in this example it doesn’t matter which method you use because it’s a bandwidth issue of where the time is spent transferring the data. Both times are close enough that the difference is likely due to other traffic on your network. Just a guess though given no code.
mansour_ib_sas
Pyrite | Level 9

I use this code.

The result of proc sql with compress =yes is better then  path through

 

rsubmit ;

options NODBIDIRECTEXEC;

proc sql;

connect to oracle

(PATH="***"  USER="****" PASSWORD="***********" preserve_comments ); 

 create table data as select *

from connection to oracle
(

select  distinct         col1,
                         col2

	from schema.table 
	where    filter1
	and filter2

)
;disconnect from oracle; 

quit; 

endrsubmit ;



rsubmit ;

options compress=yes ;
	  
proc sql; 
    create table data as select col1,
                                col2
	from schema.table 
	where filter1
	and filter2;

quit;

endrsubmit ;

 

Reeza
Super User
I’m assuming you copied and simplified because those programs are not the same and comparing different programs is comparing apples and oranges. Of course a program using distinct would take longer than one not using distinct.

Given the simplicity of the queries I still think this is a network/bandwidth issue.
Tom
Super User Tom
Super User

Check the definition of the libref SCHEMA.  Perhaps you are using different options than you did in your CONNECT statement.

 

In terms of performance of extracts it is usually the speed of the transfer from the remote database to the SAS server that is most important.  So perhaps you have options on your LIBNAME statement that enable SAS to use some bullk or fast transfer procedure.

 

Another issue is that there might be some caching going on in your database. So if you want the explicit query first and then implicit query pulled the same data again the database might respond quicker since the data was already in its cache.  Make to alternate the order of your queries in any performance tests to try to avoid giving an advantage to the method run first.

 

The dataset option COMPRESS will not have any impact on the transfer speed, but depending on whether it actually results in fewer disk blocks being written it might improve the overall speed of a operations using that dataset.

mansour_ib_sas
Pyrite | Level 9

without the distinct, runtimes are the same between the two methods

I'm adding options like buffsize, bufno. but for now no big change.

Could you give me some tips for using these options.

That's what I'm trying to do.

thank you

rsubmit ;

options NODBIDIRECTEXEC;



 
proc sql;

connect to oracle

(PATH=""  USER="" PASSWORD="" preserve_comments buffsize=500000  ); 

 create table enc_6po( bufsize=500000 BUFNO=100 compress=yes) as 
   select col1 , col2

from connection to oracle
(

select      col1,col2  

	from schema.data 
	where     filter1 and filter2
)
;disconnect from oracle; 

quit; 

endrsubmit ;

 

Reeza
Super User

The answer to original question, the difference in speed, was different queries.  So now I’m assuming your question is how to speed this process up?

 

It looks like a straight copy table with small modifications. 

Have you tried using a libname and PROC COPY instead to move the table over. That may be faster. 

Or PROC DOWNLOAD. 

You can add your WHERE using the dataset options. 

 

 


@mansour_ib_sas wrote:

without the distinct, runtimes are the same between the two methods

I'm adding options like buffsize, bufno. but for now no big change.

Could you give me some tips for using these options.

That's what I'm trying to do.

thank you

rsubmit ;

options NODBIDIRECTEXEC;



 
proc sql;

connect to oracle

(PATH=""  USER="" PASSWORD="" preserve_comments buffsize=500000  ); 

 create table enc_6po( bufsize=500000 BUFNO=100 compress=yes) as 
   select col1 , col2

from connection to oracle
(

select      col1,col2  

	from schema.data 
	where     filter1 and filter2
)
;disconnect from oracle; 

quit; 

endrsubmit ;

 


 

sas-innovate-wordmark-2025-midnight.png

Register Today!

Join us for SAS Innovate 2025, our biggest and most exciting global event of the year, in Orlando, FL, from May 6-9. Sign up by March 14 for just $795.


Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 8 replies
  • 1591 views
  • 1 like
  • 4 in conversation