We recently migrated from SAS on Linux to SAS on Windows. We're currently working with a contractor to try to track down potential issues (we see large differences in Total CPU to User + System CPU, but nothing pointing to actual bottlenecks with storage).
Users are indicating that jobs that took 20-30 minutes on Linux take upwards of 45 minutes on Windows. That seems logical to me, considering we went from an extremely high powered Linux server where storage and jobs, Metadata, Compute and Web were all running on the single server. We now have SAS spread across 3 servers (2 are VM's) with SAN storage. The pro's are the users ability to more easily work with windows aspects (actual paths, login with their windows ID's, etc). Con's are performance hits.
We are using SAS Enterprise Guide from Citrix servers and the users are reading from large Excel files. Their datasets are anywhere from 50GB to 200GB.
What expectations could we set for them logically in terms of moving from Linux to Windows? We've told them "it WILL be slower", but how much is too much?
"from large Excel files. Their datasets are anywhere from 50GB to 200GB." - You must be joking, 50gb to 200gb Excel files are your core data? Focus on that part, get a database, or store the data in a methodical, processable, datastore. This alone will be the biggest plus you will ever make to your system.
As for system perfomance, you would need a systems analyst to go through it all. Any part of the process can affect performance. Also, it depends on what you are doing with the data/process which will affect what hardware or setup best suits your needs.
So all I can say from this is bin the Excel, thats a definate.
Let me clarify - the excel files they are reading in are not 50GB to 200GB, but they (our analysts) read from an excel file and then merge that data, sort it, etc. into existing data sets that are 50-200GB.
On that note, I definitely agree that any reading from excel isn't ideal. However, we have a very young SAS system and this is how it is being used.
You need to analyse the Linux logs (if still available) with the Windows logs for the same jobs to identify program steps that are significantly slower. Then you may be able to identify the cause of the slowness.
Bench marking performance is a usual practice when migrating from one SAS server installation to another. What are the specifications of your old server versus your new servers? Do the new ones have more cores and more memory? Is IO faster or slower?
Ok, I would then suggest splitting your process into two segments.
1) Loading of data - this would be the extract from Excel of the data and the loading to a data/(warehouse, storage, sets, etc.). You will likely find it a whole lot easier to use a command line converter (Apache Tika maybe) to convert all the Excel files to plain CSV text. Then datasteps can be developed to read in the CSV data. This avoids one of the big drawbacks of Excel in its bad data storage, and avoiding using guessing procedures like proc import to generate garbage (GIGO - Garbage in garbage out). Also avoids any unnecessary post-processing.
2) Processing of the data from your data store.
Other tips:
Avoid using SQL. SQL is known to become resource heavy at times, and because it creates its own internal plan you can't guarantee that two runs with different data will run the same (sorted/unsorted is one which jumps to mind).
Avoid creating lots of datasets (often seen when macro is used), as each read/write if there is even a tiny lag on the network can be blown up to the power of number of loops.
Examine the log closely at each section to see resource intensive tasks and focus re-factoring efforts there as @SASKiwi has posted.
You can shrink your data using a variety of methods - RDBMs style, using informats etc. If you have lots of categories across large data, then create an informat of the categories, then read in the data using the informats. You can then encode the data to be as small as possible - with large text data this can be a large saving.
There is plenty of other things you can look at as well, but the biggest two are going to be importing the data and removing any loops (macro ones).
@webbm : your post reminded me a conference talk I watched many years back. The speaker discusses his organisations move from Linux to another system called FreeBSD. Many of the points made in the talk will resonate with you I'm sure.
I'd say you need to try outline the benefits to users. As a user of a system I would certainly be annoyed if a system was updated with no benefit outlined for me/the busienss and my productivity with that system dropped. If the throughput of a system dropped but I was clearly able to see benefits or be told about them then I might be less annoyed.
The topic of systems performance is a beast unto its own. With Linux there are a large number of tracing facilities for both userspace and in the kernel. With Windows there is not as much visibility. For mostly userspace workloads (statistics procs and the things that don't call into the os much) you should see little to no difference in your SAS application performance across Windows and Linux on the same hardware. When it comes to systems performance I typically refer to resources from Brendan Gregg. His homepage is here and it is packed with interesting information with regards to system performance. His book is also my main reference when it comes to looking at performance issues.
Linux to BSD is "just" a switch from one UNIX flavour to another, and both will probably (I have no clue about freebsd) use very similar if not the same commandline tools for most tasks, and therefore "feel" the same to users.
The downgrade from Linux to Windows is a head-on jump into a pile of ****, IMHO.
Hi @webbm, one place to start is the "Performance Considerations under Windows" section of the SAS Companion for Windows. There are a number of host specific options that can be taken advantage of here. For example, mult-gigabyte files can benefit from SGIO processing. If you have enough memory on your server take a look at memory based libraries.
Posting log differences here where you can identify specific cases of performance issues will help hone in on possible solutions.
The SAS Users Group for Administrators (SUGA) is open to all SAS administrators and architects who install, update, manage or maintain a SAS deployment.
SAS technical trainer Erin Winters shows you how to explore assets, create new data discovery agents, schedule data discovery agents, and much more.
Find more tutorials on the SAS Users YouTube channel.