BookmarkSubscribeRSS Feed
thanksforhelp12
Calcite | Level 5

Hello, 

 

I recently bought a Samsung T5 SSD to help speed up my I/O bottleneck in processing large databases with SAS.

 

Unfortunately, I am not getting SSD read speed when accessing data from the drive in SAS (for example, subsetting a large dataset). In fact, I have an external HDD as well that gets 60-70 Mb/s which is identical to what I get with the SSD. The SSD is not impaired because I get the expected speeds during file transfers and speed tests external to SAS. Ram and CPU are both sitting at <25% utilization.

 

Is there a way to help speed this up or can SAS not read at SDD speeds or something?

 

Thanks!

 

 

13 REPLIES 13
Reeza
Super User
To start, I'm assuming you're using a full version of SAS, since SAS UE is locked down and you likely cannot make the changes to the config file needed to tune your installation.

So what makes you think your process is bound by read speed?
Subsetting a large data set usually means creating a new version so that's more similar in timing to writing a new file, which will be slower than copying.
How does a direct copy task perform?
TomKari
Onyx | Level 15

My experience is that SAS will indeed work at SSD speeds.

 

Let's start by isolating where in the stack the problem is. This program writes &RecordCount records of 1000 bytes to disk. Set it to a number that seems reasonable, and run it to see what the difference between your previous disk and you new SSD is (if there is one).

 

In my case, setting it to 3000000 took 33 seconds on my plain jane system. SSD speeds should be MUCH higher.

 

Tom

 

%let RecordCount = 3000000;
data _null_;
	length OutRec $5000;
	file "Some Directory\SASTest.txt" lrecl=32767;
	OutRec = repeat("a", 999);

	do _i = 1 to &RecordCount.;
		put outrec;
	end;
run;
thanksforhelp12
Calcite | Level 5

HDD:

NOTE: 3000000 records were written to the file "E:\SASTest.txt".
      The minimum record length was 1000.
      The maximum record length was 1000.
NOTE: DATA statement used (Total process time):
      real time           26.75 seconds
      cpu time            7.45 seconds

 

SSD:

NOTE: 3000000 records were written to the file "D:\SASTest.txt".
      The minimum record length was 1000.
      The maximum record length was 1000.
NOTE: DATA statement used (Total process time):
      real time           19.43 seconds
      cpu time            8.32 seconds

 

And I suspect some of the difference of real time is related to the initial time it took to get the HDD spinning from rest.

 

With respect to @Reeza, indeed - full-version SAS on Windows. As for bound by read speed, only a very small number of observations are being rewritten. When I pull up Performance in Task Manager, I see the read speed and it's 60-70 Mb/s - the write speed fluctuates between 0 and whatever depending on whether it is actively writing a file at that time - CPU/RAM have plenty of space/speed and are <25%.

Reeza
Super User

What are your memsize and bufsize options? Have you tried increasing those?

 

EDIT: BUFNO seems more impactful than BUFSIZE. 

See @ChrisNZ post here:

https://communities.sas.com/t5/SAS-Programming/How-to-make-SAS-faster/td-p/310240

 

 

TomKari
Onyx | Level 15

So you're writing 111 MB per second. I started in this business in 1978...that wouldn't have been considered slow then! 🙂

 

What speeds are yoiu getting from your benchmarking software for the SSD? I've just looked up a couple of articles, and I was surprised to see that it looks like SSDs are only around 5 times faster than HDDs. I thought the difference was much greater.

 

Unfortunately I'm not experienced with using drive benchmark software, so I can't suggest anything.

 

Definitely look into @Reeza 's latest suggestions. If we only have to make up 2 or 3 times difference, buffering could certainly account for a lot of that.

 

Tom

thanksforhelp12
Calcite | Level 5

Getting 111 Mb/s would be a massive win. Throughout all my tasks I am reading at 50-60 Mb/s.

 

I did try the buffer settings and it did not make a major difference. The SGIO option is intriguing and I'll try when doing my next data step.

 

Currently querying something via SQL and these are what I am getting. It makes no sense - sometimes it is even slower than the HDD.

 

slowSSD.PNG

 

 

Reeza
Super User
You changed the buffer settings in the config file and got no speed differences?
thanksforhelp12
Calcite | Level 5
Have not changed config file, but declared
options bufno=5 bufsize=64k;
and read speed are similar
Reeza
Super User
Post the log of that please, including a proc options that shows the options were set. Make sure to use FULLSTIMER as well to see the full details.
ChrisNZ
Tourmaline | Level 20

These options are only useful when reading or writing data sets.

They make no difference when accessing flat files.

 

Patrick
Opal | Level 21

When troubleshooting performance problems, be aware that SAS processes put different demands on file systems and I/O than traditional databases or simple query processes. For that reason, you need to be able to determine the throughput rates of any file system that SAS uses. 

The sasiotest.exe utility for Microsoft Windows platforms can be used to measure the I/O behavior of the system under defined loads. The utility is easy to use and can be used to launch individual or multiple concurrent I/O tests to flood the file system and determine its raw performance.

Documentation here.

TomKari
Onyx | Level 15

From the log results you posted above, on your HDD test you wrote:
   3,000,000 records
   1,000 bytes per record
for a total of 3,000,000,000 bytes


The write took 26.75 seconds, so unless my math is really fuzzy you're moving the data at 3,000,000,000 / 26.75 = 112,149,533 bytes per second.

 

Tom

ChrisNZ
Tourmaline | Level 20

> for example, subsetting a large dataset

We need to know more.

Is it a SAS data set (engine value is V9 in proc contents) or something more modern like an SPDE data set?

Are indexes used?

Do you read the data set sequentially?

How do you subset it? WHERE clauses are usually faster provided no function is used. Otherwise IF tests are generally faster. Unless you subset in some othe way?

 

Note that large SPDE data sets are usually much faster than BASE datasets, provided they are binary-comressed and there are some CPU resources available. That 's because IO is typically divided by 10 compared to an uncompressed BASE data set. SPDE does not address the issue of drive speed, but regardless it does wonders to programs' elapse time while lowering storage space requirements.

 

Ready to join fellow brilliant minds for the SAS Hackathon?

Build your skills. Make connections. Enjoy creative freedom. Maybe change the world. Registration is now open through August 30th. Visit the SAS Hackathon homepage.

Register today!
How to connect to databases in SAS Viya

Need to connect to databases in SAS Viya? SAS’ David Ghan shows you two methods – via SAS/ACCESS LIBNAME and SAS Data Connector SASLIBS – in this video.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 13 replies
  • 2347 views
  • 5 likes
  • 5 in conversation