03-13-2017 08:07 AM
If you are working with large data, the end goal is to gain some value from it. It may be as simple as a query for sales last quarter or as complex as a creating sales forecast. The task still requires that you write a program to retrieve the data, shape and aggregate it, and then create the desired output. The entire focus for efficiency is about how well we can manipulate and move the data around.
When seeking programming efficiencies, many new programmers and even administrators might think that performance is related only to the hardware. However, powerful servers are only a small part of the equation. If you install a powerful server without making other changes, then you only enable bloated databases and inefficient programs to run lighting quick!
Performance should be approached from several viewpoints as shown in the following figure. By working with each of these environment components you can ensure you are getting the most performance.
Join Ben Murphy and Nick Welke at SAS Global Forum as they present Quickish Performance Techniques for Biggish Data. This paper provides some quickishtechniques Zencos has applied for working with biggish data that made the difference.
03-13-2017 08:21 AM
The first thing I always look at when problems arise is what you call "Coding Efficiencies". There you can get the biggest bang for the buck, as writing better code is usually not cost-intensive. The time one spends on better code now comes back in the form of easier maintenance in the future.
SAS provides a lot of opportunities to bring down even the fastest server with just a few lines of little-thought-about code.
03-13-2017 12:26 PM
I think the first thing you need to do is define 'Big Data' and 'Big Computation' for programmers and de-identified data.
Here is my defintions fro separating 'server' and workstation.
Big data occurs when any temporary or permanent object exceeds 1 terabyte. Note 1TB is now in the $25 range.
Greater than 1TB use server. Soon 1TB thumb drives will arrive.
'Big Computation' more than 32 cores and 8 hrs of CPU time use server.
My experience is that a power worstation provides a better performing platform for data < 1TB and 32 cores,
The new inexpensive AMD Ryzen processors will drive down the cost 'power workstations'
03-14-2017 08:55 AM
Yep ... these are some of our planned discussion points.
Hope to see you there.
03-13-2017 03:19 PM
In addition to @KurtBremser's valuable insights I would also make the point that coding efficiencies should not be the one and only focus when developing applications. Understandability, reliability and maintainability are other factors to keep in mind.