BookmarkSubscribeRSS Feed
TriciaAanderud
Lapis Lazuli | Level 10

If you are working with large data, the end goal is to gain some value from it. It may be as simple as a query for sales last quarter or as complex as a creating sales forecast. The task still requires that you write a program to retrieve the data, shape and aggregate it, and then create the desired output. The entire focus for efficiency is about how well we can manipulate and move the data around.

When seeking programming efficiencies, many new programmers and even administrators might think that performance is related only to the hardware. However, powerful servers are only a small part of the equation. If you install a powerful server without making other changes, then you only enable bloated databases and inefficient programs to run lighting quick! 

 

Performance should be approached from several viewpoints as shown in the following figure. By working with each of these environment components you can ensure you are getting the most performance.

 

Join Ben Murphy and Nick Welke at SAS Global Forum as they present Quickish Performance Techniques for Biggish Data.  This paper provides some quickishtechniques Zencos has applied for working with biggish data that made the difference.


Tricia Aanderud

Twitter: @taanderud - Follow me!

5 REPLIES 5
Kurt_Bremser
Super User

The first thing I always look at when problems arise is what you call "Coding Efficiencies". There you can get the biggest bang for the buck, as writing better code is usually not cost-intensive. The time one spends on better code now comes back in the form of easier maintenance in the future.

SAS provides a lot of opportunities to bring down even the fastest server with just a few lines of little-thought-about code.

TriciaAanderud
Lapis Lazuli | Level 10
Exactly! Those are some things we discuss in the paper.

Tricia Aanderud

Twitter: @taanderud - Follow me!

rogerjdeangelis
Barite | Level 11

I think the first thing you need to do is define 'Big Data' and 'Big Computation' for programmers and de-identified data.

 

 Here is my defintions fro separating 'server' and workstation.

 

  Big data occurs when any temporary or permanent object exceeds 1 terabyte. Note 1TB is now in the $25 range.

  Greater than 1TB use server. Soon 1TB thumb drives will arrive.

 

  'Big Computation' more than 32 cores and 8 hrs of CPU time use server.

 

  My experience is that a power worstation provides a better performing  platform for data < 1TB and 32 cores,

  The new inexpensive AMD Ryzen processors will drive down the cost  'power workstations'

TriciaAanderud
Lapis Lazuli | Level 10

Yep ... these are some of our planned discussion points.

Hope to see you there.


Tricia Aanderud

Twitter: @taanderud - Follow me!

SASKiwi
PROC Star

In addition to @Kurt_Bremser's valuable insights I would also make the point that coding efficiencies should not be the one and only focus when developing applications. Understandability, reliability and maintainability are other factors to keep in mind.

suga badge.PNGThe SAS Users Group for Administrators (SUGA) is open to all SAS administrators and architects who install, update, manage or maintain a SAS deployment. 

Join SUGA 

CLI in SAS Viya

Learn how to install the SAS Viya CLI and a few commands you may find useful in this video by SAS’ Darrell Barton.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 5 replies
  • 1230 views
  • 7 likes
  • 4 in conversation