BookmarkSubscribeRSS Feed
BillJones
Calcite | Level 5

Hello,

I'm running SAS 9.3 on a Windows 7 64 machine with 24GB of RAM and 2 x Intel Xeon 2.4GHz processors.  My jobs seem to take forever.  I checked the amount of RAM and CPU usage in the Windows Task Manager, and SAS is using approximately 75MB of RAM and only 13% of the CPU.  Note I changed the memsize option in the config file from 2GB to 18GB and also moved up the sortsize to 13.5GB.  Why is the RAM/CPU usage so low?  Are there some tweaks that I can make to improve performance?  Adjust buffersize?


I've attached the output from running proc options.

Thanks very much for any suggestions.

-Bill

14 REPLIES 14
SASKiwi
PROC Star

SAS performance isn't only CPU and memory usage. Check your SAS logs. How long is forever?! What steps are taking the longest? If its DATA steps and PROC SORTs then it is most likely IO is the culprit.

Run your code with SAS option FULLSTIMER to check memory and IO usage.

Focus on the steps taking longest - can they be re-coded to improve performance?

BillJones
Calcite | Level 5

SASKiwi,

Thanks for your thoughts.  I did a simple data step where I appended several files into one file.  Now the data files were large (total size of the combined file was 204GB).  However, the job took 75 minutes.  I can see perhaps 30 minutes for something like this, but over an hour?  It seems to be taking too long.  Note the hard disk is a Hitachi 7200 RPM.  If the issue is IO, how do I fix it?   Adjust the BLKSIZE?

Thanks again,

Bill

Code:

data data_exp.policysummarystatsraw;

set data_exp.policysummarystats2008_ data_exp.policysummarystats2009_ data_exp.policysummarystats2010_

data_exp.policysummarystats2011_ data_exp.policysummarystats2012_ data_exp.policysummarystats2013_

data_exp.policysummarystats2014_;

run;

Log:

NOTE: There were 30051 observations read from the data set DATA_EXP.POLICYSUMMARYSTATS2008_.

NOTE: There were 25949456 observations read from the data set DATA_EXP.POLICYSUMMARYSTATS2009_.

NOTE: There were 29754796 observations read from the data set DATA_EXP.POLICYSUMMARYSTATS2010_.

NOTE: There were 30672849 observations read from the data set DATA_EXP.POLICYSUMMARYSTATS2011_.

NOTE: There were 32867276 observations read from the data set DATA_EXP.POLICYSUMMARYSTATS2012_.

NOTE: There were 19762534 observations read from the data set DATA_EXP.POLICYSUMMARYSTATS2013_.

NOTE: There were 1535814 observations read from the data set DATA_EXP.POLICYSUMMARYSTATS2014_.

NOTE: The data set DATA_EXP.POLICYSUMMARYSTATSRAW has 140572776 observations and 172 variables.

NOTE: DATA statement used (Total process time):

      real time           1:15:51.35

      user cpu time       2:43.30

      system cpu time     8:36.80

      memory              1565.71k

      OS Memory           8632.00k

      Timestamp           06/23/2014 04:27:35 PM

SASKiwi
PROC Star

A couple of things to try:

On your DATA statement - data data_exp.policysummarystatsraw (compress = binary);

If compressing the output dataset speeds things up then try compressing your yearly input datasets as well.

If the yearly datasets contain the same variable names try to APPEND the tables like so:

proc datasets library = data_exp;

  append base =  policysummarystatsraw

             data = policysummarystats2008_;

run;

quit;

You can only append one dataset at a time but overall it may still be quicker.

BillJones
Calcite | Level 5

SASKiwi,

Thanks for the suggestion.  I created a macro using the code you posted.  Tried it on a sample of 1000 obs and seems to execute quickly.  I'm running it now on all the data and will post the results.

-Bill

SASKiwi
PROC Star

Great. BTW you can use COMPRESS=BINARY with PROC DATASETS as well the DATA step, if you are not already using it.

BillJones
Calcite | Level 5

SASKiwi,

Thanks.  Already using COMPRESS=BINARY with PROC DATASETS and the DATA step.  Really shrinks the size of the files.


Regards,

Bill

Kurt_Bremser
Super User

You say "the" harddisk, implying you run everything on one disk.

Now to The Math:

204GB filesize means that 204 GB have to be read _and_ written, giving overall total I/O of 408 GB.

408*1024 = 417,792 MB

417,792 / 75 / 60 = 92.84 MB/sec, which is not bad for a single 7200 RPM disk. You are simply massively I/O bound. Especially since you read and write concurrently on the same disk, which further reduces its I/O throughput.

Don't be concerned about memory usage, when doing simple read/write operations SAS does not need more.

What can you do?

- use compression, as suggested. This will increase CPU load (of which you have plenty available) and massively reduce I/O, especially with datasets that contain empty strings etc

- get some REAL storage HW: use 2 or more disks with striping, this will more or less reduce the I/O time by the number of disks. Take caution, failure of one disk will destroy all data! To be on the safe side, set up 2 stripesets and use RAID1 between them

BillJones
Calcite | Level 5

Kurt,

Thanks very much for the analysis.  The box that I'm running SAS on has 2 HDDs (both 7200 RPM).  I'll create a temp sas data folder on the other drive to write large data sets.  Hopefully, this will increase performance.  I would just purchase an SSD, but my firm plans on upgrading to a server soon. 

-Bill

SASKiwi
PROC Star

I recommend setting COMPRESS = BINARY in an OPTIONS statement. That way all of your WORK datasets and utility files get compressed too and you don't have to keep repeating COMPRESS = on all required steps.

BillJones
Calcite | Level 5

Nice!  Will do.  Thanks for the tip.

Kurt_Bremser
Super User

I once had the pleasure of attending a presentation at SEUGI in Florence where the ideal setup of disk storage for SAS was the main theme, and I took the following with me:

If you have a SAN, it is the responsibility of the SAN manager to provide you with enough throughput; can't do much yourself there.

If you work with local disk storage, consider the following:

Is an outage in the DWH service caused by a failed disk tolerable? If not, always use some kind of mirroring, RAID 1 preferred (the fastest method for writing), starting with 2 disk adapters across which you place the mirrors.

Try to saturate the adapter's maximum throughput with disks. Out of the old SCSI days: adapter supports 320M/sec, disks deliver 40m/sec -> 8 disks per adapter; set up striping so that all disks work in parallel and every access causes all disks (in a logical group) to work.

Set up several physically independent volumes and assign those as "special work" libraries. This allows you to do MERGE's where SAS reads infile 1 from disk (group) A, infile 2 from disk (group) B and writes outfile to disk (group) C. You'll be suprised what you can gain with that.

Set the UTILLOC so that it points to a different disk (group) than WORK. This prevents disk contention when doing a sort.

Most of this applies to magnetic storage, SSD's avoid many of the problems caused by latency and seek times, alleviating the effects of disk contention.

BillJones
Calcite | Level 5

Kurt,

Thanks very much for the information.  I believe that the I/O environment on our new server will be a SAN.  Therefore, we'll just have to be sure that we get enough throughput.  How much throughput would you recommend for a small shop of SAS users (lt 6) working with data sets in the millions of records (uncompressed large file would be ~200gb)?

Regards,

Bill

Kurt_Bremser
Super User

In the past, I have tried to achieve a consistent data rate of 100 MB/s writing on the SAN, but our workspaces are located on internal disks.

Given that your individual files are much larger than ours, and your SAN may also be required to handle the WORKs, I'd go for 500 MB/s. If somebody asks, just tell them that data warehousing IS an I/O intensive task; where DB systems need to be quick in handling small data chunks randomly, DWH's usually go through _all_ the data in a more or less sequential way.

BillJones
Calcite | Level 5

Kurt,

Thank you for your thoughts!  At least now I'll know what to request.  I'll definitely shoot for 500 MB/s.  We'll see what IT says.

Regards,

Bill

suga badge.PNGThe SAS Users Group for Administrators (SUGA) is open to all SAS administrators and architects who install, update, manage or maintain a SAS deployment. 

Join SUGA 

Get Started with SAS Information Catalog in SAS Viya

SAS technical trainer Erin Winters shows you how to explore assets, create new data discovery agents, schedule data discovery agents, and much more.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 14 replies
  • 4647 views
  • 6 likes
  • 3 in conversation