I have a huge data set and trying to cut down the operation time. The code i usually using :
DATA x; SET LIB.set1;
/*set1 is 4.6GB for following example. For real work always > 20 sets or more*/ WHERE code1 in: ("sas","test"); RUN;
I find that when using different storage device, the operation time (sec) is followin with this simple formula: Data size (GB) / Device speed (GB/s). When my LIB is setting on a traditional 2.5" HDD (~0.1g/s) the real time is 49.8 sec, and when LIB is setting on SATA SSD (~0.4GB/s) it takes 11.7 sec. I also try the ram disk and it only takes 2.1 sec (~2g/s).
libname HDD 'E:/';
data H;set HDD.set1;where code1 in: ("401","402","403","404","405") or code2 in: ("401","402","403","404","405") or code3 in: ("401","402","403","404","405");
run;
/*NOTE: DATA statement used (Total process time):
real time 49.80 seconds
user cpu time 3.43 seconds
system cpu time 2.79 seconds
memory 1014.28k
OS Memory 18672.00k*/
libname SSD 'C:/test'; data S;set SSD.set1;where code1 in: ("401","402","403","404","405") or code2 in: ("401","402","403","404","405") or code3 in: ("401","402","403","404","405"); run; /*NOTE: DATA statement used (Total process time): real time 11.71 seconds user cpu time 1.34 seconds system cpu time 0.87 seconds memory 1014.28k OS Memory 18672.00k*/ libname RAM 'S:/'; data R;set RAM.set1;where code1 in: ("401","402","403","404","405") or code2 in: ("401","402","403","404","405") or code3 in: ("401","402","403","404","405"); run; /*NOTE: DATA statement used (Total process time): real time 2.13 seconds user cpu time 1.45 seconds system cpu time 0.64 seconds memory 1285.28k OS Memory 18416.00k*/
It is hard to have a bigger ram disk more than 20GB for me. But it's easy to reach 500GB by using PCIe SSD with a 1.5g/sec read speed. However, when i using a PCIe SSD with same code and same data set, it only runs 0.37G/sec and takes 12.4 sec.
libname PCIE 'D:/cd'; data P;set PCIE.set1;where code1 in: ("401","402","403","404","405") or code2 in: ("401","402","403","404","405") or code3 in: ("401","402","403","404","405"); run; /*NOTE: DATA statement used (Total process time): real time 12.43 seconds user cpu time 1.09 seconds system cpu time 1.04 seconds memory 1014.28k OS Memory 18928.00k*/
So i wonder that whether SAS does NOT support PCIe 3.0*4, or it is lead by any other reason that i have to do some setting to reach the optimal read speed of PCIE SSD. Thanks!
ps1. The read speed is cheecked by watching the Task Manager.
ps2. WORK library is located at Ram Disk to increase the comparability and minimize the effect of the write speed bound of each storage device.
ps3. The PCIe SSD can reach 1.5g/s on my laptop for normal seq-read work.
ps4. SAS 9.4 on WIN10 home
SPD Engine is helpful ! Jusk using a single pathway and then it can reach a 1.2g/s read speed.
libname PCIE spde 'D:\data' partsize=1024;
Thanks a lot, and I'll try more setting to tune the performance of SPD Engine.
It might be something in the PCIe settings that slows down your bus performance and makes your PCIe disk slower than the more directly attached SATA SSD. Or there is an additional device on the PCIe that slows the whole bus down.
Since you have a notebook that performs better, you have something to compare the settings with.
There ARE reasons why dedicated server computers (ie a pSeries) are more expensive. One being that they are perfectly tuned (HW- and SW-wise) out of the box.
Yes, the current ssd is a tlc budget one. Maybe it is lead by the tlc cache or any other reason still not sure.
I'm trying to using another MLC PCIE ssd and hope it can succeed !
Jack,
SAS won't be concerned with your device level drivers for your PCIe bus. The speeds you're reaching are already quite decent. I suspect you're most likely hitting an I/O operations (iops) wall rather than a PCIe bus throughput limit, whether in the hardware or in the kernel/cache.
Try reducing the number of IO operations required by increasing the BLKSIZE on your sesssion / data, and also experimenting with SGIO. I'm guessing you'll see a 20-30% increase in read speed by upping the blksize, and a much more drastic change by using SGIO. However, don't be surprised if you struggle to max out your disk with a single thread. A better test of your hardware will be to run a few of these jobs in parallel - you may be able to push your disk much closer to the limit that way. The Windows i/o subsystem also isn't the greatest.
What kind of speeds are you seeing with tools like CrystalDiskMark?
Nik
The SAS Users Group for Administrators (SUGA) is open to all SAS administrators and architects who install, update, manage or maintain a SAS deployment.
Learn how to install the SAS Viya CLI and a few commands you may find useful in this video by SAS’ Darrell Barton.
Find more tutorials on the SAS Users YouTube channel.