DATA Step, Macro, Functions and more

optimizing query

Accepted Solution Solved
Reply
Occasional Contributor
Posts: 10
Accepted Solution

optimizing query

I want to create SAS dataset from DB2 table (27423639 records). Its taking about 1 hr 50 minutes. How can I optimize it? 

 


Accepted Solutions
Solution
‎09-13-2016 01:25 AM
Super User
Posts: 6,928

Re: optimizing query

Well, if you have to pull all that over the network, a quick calculation resolves to a speed of 10.5 MB/sec, which is about par for a 100 Mbit network, but slow by an order of magnitude for a 1 Gbit network.

So you could either try to improve network throughput, or reduce the size of data that has to cross the network by using SQL pass-through to subset records and/or drop columns that are not needed.

---------------------------------------------------------------------------------------------
Maxims of Maximally Efficient SAS Programmers

View solution in original post


All Replies
Super User
Posts: 17,750

Re: optimizing query

Is it a straight extract? Are you pulling it across a network? 

 

27Million records isn't a huge file, probably about 5-8GB? But moving things across a network can take time and that's probably your bottleneck. 

 

How are you currently creating the dataset, a data step, proc sql, proc copy/datasets?

Occasional Contributor
Posts: 10

Re: optimizing query

Size of db2 table is around 89 GB(observation size 3520 bytes).

I am using proc sql to create dataset from db2 table.

Super User
Posts: 6,928

Re: optimizing query

Fine. Still the question remains: do you pull all that over the network?

---------------------------------------------------------------------------------------------
Maxims of Maximally Efficient SAS Programmers
Super User
Posts: 9,662

Re: optimizing query

1)Add option readbuff=100000 into your LIBNAME statement. 2)If you use Pass-Through SQL , that may give you faster .
Occasional Contributor
Posts: 10

Re: optimizing query

Thanks for your quick response.

I am using pass through and already tried readbuff with different values but no significant improvement in performance.

Super User
Posts: 6,928

Re: optimizing query

What is the observation size of the SAS dataset?

And how do you retrieve the data (post code with anonymized table names)?

 

I also guess that your network may be the cause, but ~2hrs for ~8 GB would translate to ~1MB/s, which would be very slow, but could be if you do it across a slower WAN connection.

---------------------------------------------------------------------------------------------
Maxims of Maximally Efficient SAS Programmers
Occasional Contributor
Posts: 10

Re: optimizing query

Size of db2 table is around 65GB(observation size 3520 bytes).

Want to create SAS dataset from this db2 table.

I have already used readbuff like options but no improvement.

Solution
‎09-13-2016 01:25 AM
Super User
Posts: 6,928

Re: optimizing query

Well, if you have to pull all that over the network, a quick calculation resolves to a speed of 10.5 MB/sec, which is about par for a 100 Mbit network, but slow by an order of magnitude for a 1 Gbit network.

So you could either try to improve network throughput, or reduce the size of data that has to cross the network by using SQL pass-through to subset records and/or drop columns that are not needed.

---------------------------------------------------------------------------------------------
Maxims of Maximally Efficient SAS Programmers
Super User
Posts: 9,662

Re: optimizing query

Another alternative way is using PROC DBLOAD . But it is very old.

The following code doesn't make you faster ?

libname xx db2 dsn=.........    readbuff=100000;
proc copy in=xx out=work;
 select xxxx;
run;

☑ This topic is SOLVED.

Need further help from the community? Please ask a new question.

Discussion stats
  • 9 replies
  • 331 views
  • 2 likes
  • 4 in conversation