I want to create SAS dataset from DB2 table (27423639 records). Its taking about 1 hr 50 minutes. How can I optimize it?
Well, if you have to pull all that over the network, a quick calculation resolves to a speed of 10.5 MB/sec, which is about par for a 100 Mbit network, but slow by an order of magnitude for a 1 Gbit network.
So you could either try to improve network throughput, or reduce the size of data that has to cross the network by using SQL pass-through to subset records and/or drop columns that are not needed.
Is it a straight extract? Are you pulling it across a network?
27Million records isn't a huge file, probably about 5-8GB? But moving things across a network can take time and that's probably your bottleneck.
How are you currently creating the dataset, a data step, proc sql, proc copy/datasets?
Size of db2 table is around 89 GB(observation size 3520 bytes).
I am using proc sql to create dataset from db2 table.
Fine. Still the question remains: do you pull all that over the network?
Thanks for your quick response.
I am using pass through and already tried readbuff with different values but no significant improvement in performance.
What is the observation size of the SAS dataset?
And how do you retrieve the data (post code with anonymized table names)?
I also guess that your network may be the cause, but ~2hrs for ~8 GB would translate to ~1MB/s, which would be very slow, but could be if you do it across a slower WAN connection.
Size of db2 table is around 65GB(observation size 3520 bytes).
Want to create SAS dataset from this db2 table.
I have already used readbuff like options but no improvement.
Well, if you have to pull all that over the network, a quick calculation resolves to a speed of 10.5 MB/sec, which is about par for a 100 Mbit network, but slow by an order of magnitude for a 1 Gbit network.
So you could either try to improve network throughput, or reduce the size of data that has to cross the network by using SQL pass-through to subset records and/or drop columns that are not needed.
Another alternative way is using PROC DBLOAD . But it is very old. The following code doesn't make you faster ? libname xx db2 dsn=......... readbuff=100000; proc copy in=xx out=work; select xxxx; run;
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.