BookmarkSubscribeRSS Feed
jaredp
Quartz | Level 8

Hi everyone, I have a new computer on the way and I am hoping for some insight on installing SAS to it.


The main purpose of the computer is for ETL and Enterprise Miner / Text Miner work.  Other software that will be installed includes:

  • SAS Base of course
  • Enterprise Guide
  • SAS/ESRI Bridge
  • MS Visual Studio
  • MySQL
  • ArcGIS


As you can see from the list of software, the computer will have a variety of use cases, but the most important one right now is ETL and Data/Text Mining.


The Computer specs are:


Windows 7 Ultimate 64bit
Dual Processors:  Two Intel Xeon E5-2680 2.70 20MB 1600 8C
NVIDIA Quadro K4000 3GB DL-DVI(I)+DP+DP 1st
128GB DDR3-1600 (8x8GB+8x8GB) 2CPU Registered RAM
512GB SATA 1st Solid State Drive
600GB 15k RPM SAS 2nd Hard Drive
1TB 10K RPM SATA 3rd Hard Drive

The computer has 3 disks. I’m planning on doing the following:

SSD - OS and software
600GB 15k – Scratch space (….is this where SASWORK goes?)
1TB 10K – General storage / backup


I have 7 questions:


  1. Any recommendations or tips for how and where to install the SAS software given these 3 disks to choose from? 
  2. Are there any 64bit SAS software tweaks I should be aware of?
  3. Would it be better to get another SSD in place of the 600GB SATA?  (My concern is SSD Lifespan.  The SSD’s are consumer entry ones (i.e. multi-layer cell drives?).  If this is my “scratch disk” for temporary stuff, wouldn’t it decrease the lifespan of the SSD?)
  4. Perhaps I have enough RAM that one doesn’t need a dedicated scratch disk?  The largest data sets I work with are maybe 100,000 observations (some of them can have up to 200 variables).  However, when I do Text Mining, it often results in much larger temporary data sets. Thoughts?
  5. I’ve read that “SAS usually does large, blocked I/O, especially when doing analytical tasks” and I’ve also read that ETL processes require good I/O throughput.  These are my primary uses of the computer (Data and Text Mining and ETL).  How does this affect how I use these 3 disks?
  6. I've also read I need to turn on "Read-Ahead and Write-Behinds/Write-Through and enable dynamic multi-pathing to spread I/O over multiple fiber channels" but I have no idea how one does this on a Windows machine  (I am used to activating TRIM for SSD and tweaking my drives on my personal Linux computer at home).  Can anyone shed insight on this?
  7. Lastly, can I take advantage of the new text mining and data mining HPA procs?  Or are these new HPA procs only usable in particular ‘server’ products as opposed to the desktop products for which I am using?

Thank you.

3 REPLIES 3
yoanbolduc
Calcite | Level 5

Hi Jared,

     Looked at the setup you are proposing. I am a little puzzled by your hardware decisions. I find this to be a pretty "big" machine to still use a client version of Windows or even Windows at all. I see you want to use Visual Studio so this is probably the reason why you opted for Windows. If you are flexible on the operating system choice, I would recommend installing Linux instead of Windows.

     For your disk system, have you considered using a RAID 5 made of maybe 4 or 5 of thos 1 TB 10 KRPM disks? You'll get data protection and increase IOPS. You have to keep in mind that SAS usually requires about 50-75 MB/s of I/O per CPU core to ensure all CPU cores are "busy". As for your other disks, having 2 SSDs (one for OS/Apps and one for work) would be the best. If you don't want to purchase 2 SSDs, I would put the OS and Apps on a HD and SASWORK on the SSD. Yes, your SSD's lifespan will be much shorter but the performance gain you'll observe are definitely worth it. Also, as you seam to understand, SAS does mostly sequential I/O so this help "save" your SSD.

     Read-Ahead and Write-Behind is useful. But multi-pathing only applies if you are using a SAN which does not seams to be your case.

     For the procs, I'm not sure what the answer is. I rarely see people using the desktop version of SAS anymore so I can hardly tell, best thing would be to check with your sales rep.

I hope this helps a little.

Yoan

jaredp
Quartz | Level 8

Hi Yoan.  Thanks for the feedback.

The Windows OS is both a result of IT (they don't support Linux) because a server based approach has little return on investment for us at this point in time.  

I am considering RAID after doing more reading and felt that the OS and Apps would be on an SSD and the scratch disk (pureley temporary space) would be RAID 0 using at least two 1TB 10,000 RPM drives.  (Although I could use an SSD instead.  Will the SSD be much faster than the RAID0 of two 10K RPM drives?).

I was also thinking of doing another two 10K RPM drives on RAID1 for file storage, but perhaps this introduces a bottleneck when my project files are located there?  Perhaps this is why you suggested RAID5 of a number of drives?


yoanbolduc
Calcite | Level 5

Hi Jared,

     I understand the situation with your IT dept. Keep in mind, Linux is not only for server applications, it can be used as a desktop operating system too and a very great one actually.

     OS and Apps don't require that much I/O so this is not where I would invest the most. A good 512 GB SSD will outperform any 2 drive RAID 0 setup in any application. You need at least 4 HD in RAID 0 to match a good 512 GB SSD. A good rule of thumb is: SSD for speed, HD for volume.

     Going RAID 1 for storage is very good for data security. But you will get poor write performance and very low capacity compared to a comparable RAID 5 setup. The only issue with RAID 5 is that you will need AT LEAST 3 disks. Also, RAID 5 does not take random IO as well as RAID 1 or RAID 0 but with SAS you'll get mostly sequentiel IO so you don't have to worry about this.

     Here is something to think about (please keep in mind that this is oversimplified): you have 16 cores, so you should aim for 800 MB/s of IO throughput. This is a lot but your SSD should give you close to 500 MB/s and it is safe to expect 170-200 MB/s from a 3 disk RAID 5 with a good controller. So if you plan your workload correctly, you should be able to keep this CPU busy!

Yoan

suga badge.PNGThe SAS Users Group for Administrators (SUGA) is open to all SAS administrators and architects who install, update, manage or maintain a SAS deployment. 

Join SUGA 

CLI in SAS Viya

Learn how to install the SAS Viya CLI and a few commands you may find useful in this video by SAS’ Darrell Barton.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 3 replies
  • 2047 views
  • 1 like
  • 2 in conversation