BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
paritosh176
Fluorite | Level 6

I want to create SAS dataset from DB2 table (27423639 records). Its taking about 1 hr 50 minutes. How can I optimize it? 

 

1 ACCEPTED SOLUTION

Accepted Solutions
Kurt_Bremser
Super User

Well, if you have to pull all that over the network, a quick calculation resolves to a speed of 10.5 MB/sec, which is about par for a 100 Mbit network, but slow by an order of magnitude for a 1 Gbit network.

So you could either try to improve network throughput, or reduce the size of data that has to cross the network by using SQL pass-through to subset records and/or drop columns that are not needed.

View solution in original post

9 REPLIES 9
Reeza
Super User

Is it a straight extract? Are you pulling it across a network? 

 

27Million records isn't a huge file, probably about 5-8GB? But moving things across a network can take time and that's probably your bottleneck. 

 

How are you currently creating the dataset, a data step, proc sql, proc copy/datasets?

paritosh176
Fluorite | Level 6

Size of db2 table is around 89 GB(observation size 3520 bytes).

I am using proc sql to create dataset from db2 table.

Ksharp
Super User
1)Add option readbuff=100000 into your LIBNAME statement. 2)If you use Pass-Through SQL , that may give you faster .
paritosh176
Fluorite | Level 6

Thanks for your quick response.

I am using pass through and already tried readbuff with different values but no significant improvement in performance.

Kurt_Bremser
Super User

What is the observation size of the SAS dataset?

And how do you retrieve the data (post code with anonymized table names)?

 

I also guess that your network may be the cause, but ~2hrs for ~8 GB would translate to ~1MB/s, which would be very slow, but could be if you do it across a slower WAN connection.

paritosh176
Fluorite | Level 6

Size of db2 table is around 65GB(observation size 3520 bytes).

Want to create SAS dataset from this db2 table.

I have already used readbuff like options but no improvement.

Kurt_Bremser
Super User

Well, if you have to pull all that over the network, a quick calculation resolves to a speed of 10.5 MB/sec, which is about par for a 100 Mbit network, but slow by an order of magnitude for a 1 Gbit network.

So you could either try to improve network throughput, or reduce the size of data that has to cross the network by using SQL pass-through to subset records and/or drop columns that are not needed.

Ksharp
Super User
Another alternative way is using PROC DBLOAD . But it is very old.

The following code doesn't make you faster ?

libname xx db2 dsn=.........    readbuff=100000;
proc copy in=xx out=work;
 select xxxx;
run;

sas-innovate-2024.png

Don't miss out on SAS Innovate - Register now for the FREE Livestream!

Can't make it to Vegas? No problem! Watch our general sessions LIVE or on-demand starting April 17th. Hear from SAS execs, best-selling author Adam Grant, Hot Ones host Sean Evans, top tech journalist Kara Swisher, AI expert Cassie Kozyrkov, and the mind-blowing dance crew iLuminate! Plus, get access to over 20 breakout sessions.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 9 replies
  • 1256 views
  • 2 likes
  • 4 in conversation