BookmarkSubscribeRSS Feed
Jack_Lin
Calcite | Level 5

I have a huge data set and trying to cut down the operation time. The code i usually using :

DATA x; SET LIB.set1; 
/*set1 is 4.6GB for following example. For real work always > 20 sets or more*/ WHERE code1 in: ("sas","test"); RUN;

 

I find that when using different storage device, the operation time (sec) is followin with this simple formula: Data size (GB) / Device speed (GB/s). When my LIB is setting on a traditional 2.5" HDD (~0.1g/s) the real time is 49.8 sec, and when LIB is setting on SATA SSD (~0.4GB/s) it takes 11.7 sec. I also try the ram disk and it only takes 2.1 sec (~2g/s).

 

 

libname HDD 'E:/';
data H;set HDD.set1;where code1 in: ("401","402","403","404","405") or code2 in: ("401","402","403","404","405") or code3 in: ("401","402","403","404","405");
run;
/*NOTE: DATA statement used (Total process time):
      real time           49.80 seconds
      user cpu time       3.43 seconds
      system cpu time     2.79 seconds
      memory              1014.28k
      OS Memory           18672.00k*/
libname SSD 'C:/test'; data S;set SSD.set1;where code1 in: ("401","402","403","404","405") or code2 in: ("401","402","403","404","405") or code3 in: ("401","402","403","404","405"); run; /*NOTE: DATA statement used (Total process time): real time 11.71 seconds user cpu time 1.34 seconds system cpu time 0.87 seconds memory 1014.28k OS Memory 18672.00k*/ libname RAM 'S:/'; data R;set RAM.set1;where code1 in: ("401","402","403","404","405") or code2 in: ("401","402","403","404","405") or code3 in: ("401","402","403","404","405"); run; /*NOTE: DATA statement used (Total process time): real time 2.13 seconds user cpu time 1.45 seconds system cpu time 0.64 seconds memory 1285.28k OS Memory 18416.00k*/

 

It is hard to have a bigger ram disk more than 20GB for me.  But it's easy to reach 500GB by using PCIe SSD with a 1.5g/sec read speed. However, when i using a PCIe SSD with same code and same data set, it only runs 0.37G/sec and takes 12.4 sec.

 

libname PCIE 'D:/cd';
data P;set PCIE.set1;where code1 in: ("401","402","403","404","405") or code2 in: ("401","402","403","404","405") or code3 in: ("401","402","403","404","405");
run;
/*NOTE: DATA statement used (Total process time):
      real time           12.43 seconds
      user cpu time       1.09 seconds
      system cpu time     1.04 seconds
      memory              1014.28k
      OS Memory           18928.00k*/

 

 

So i wonder that whether SAS does NOT support PCIe 3.0*4, or it is lead by any other reason that i have to do some setting to reach the optimal read speed of PCIE SSD.  Thanks!

 

ps1. The read speed is cheecked by watching the Task Manager.

ps2. WORK library is located at Ram Disk to increase the comparability and minimize the effect of the write speed bound of each storage device.

ps3. The PCIe SSD can reach 1.5g/s on my laptop for normal seq-read work.

ps4. SAS 9.4 on WIN10 home

5 REPLIES 5
LinusH
Tourmaline | Level 20
I'm not a PC engineer so let the PCI question to others, I stick with commenting on the SAS part.
Since you have a where clause consider creating an index.
Also, store the data in a SPDE library, that among other things allows for multi threaded I/O.
Data never sleeps
Jack_Lin
Calcite | Level 5

SPD Engine is helpful ! Jusk using a single pathway and then it can reach a 1.2g/s read speed.

 

libname PCIE spde 'D:\data' partsize=1024;

Thanks a lot, and I'll try more setting to tune the performance of SPD Engine.

Kurt_Bremser
Super User

It might be something in the PCIe settings that slows down your bus performance and makes your PCIe disk slower than the more directly attached SATA SSD. Or there is an additional device on the PCIe that slows the whole bus down.

Since you have a notebook that performs better, you have something to compare the settings with.

There ARE reasons why dedicated server computers (ie a pSeries) are more expensive. One being that they are perfectly tuned (HW- and SW-wise) out of the box.

 

Jack_Lin
Calcite | Level 5

Yes, the current ssd is a tlc budget one. Maybe it is lead by the tlc cache or any other reason still not sure.

 

I'm trying to using another MLC PCIE ssd and hope it can succeed !

boemskats
Lapis Lazuli | Level 10

Jack,

 

SAS won't be concerned with your device level drivers for your PCIe bus. The speeds you're reaching are already quite decent. I suspect you're most likely hitting an I/O operations (iops) wall rather than a PCIe bus throughput limit, whether in the hardware or in the kernel/cache. 

 

Try reducing the number of IO operations required by increasing the BLKSIZE on your sesssion / data, and also experimenting with SGIO. I'm guessing you'll see a 20-30% increase in read speed by upping the blksize, and a much more drastic change by using SGIO. However, don't be surprised if you struggle to max out your disk with a single thread. A better test of your hardware will be to run a few of these jobs in parallel - you may be able to push your disk much closer to the limit that way. The Windows i/o subsystem also isn't the greatest.

 

What kind of speeds are you seeing with tools like CrystalDiskMark?

 

Nik

 

 

suga badge.PNGThe SAS Users Group for Administrators (SUGA) is open to all SAS administrators and architects who install, update, manage or maintain a SAS deployment. 

Join SUGA 

CLI in SAS Viya

Learn how to install the SAS Viya CLI and a few commands you may find useful in this video by SAS’ Darrell Barton.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 5 replies
  • 1555 views
  • 2 likes
  • 4 in conversation