SAS Data Integration Studio, DataFlux Data Management Studio, SAS/ACCESS, SAS Data Loader for Hadoop and others

How to reduce job running time from 1 hour to 45 mins(Performance tunning)

Reply
Contributor
Posts: 61

How to reduce job running time from 1 hour to 45 mins(Performance tunning)

Hi Experts,

I am working on DI studio.

I am planning to reduse a job running time from 1 hour to 45 minutes.

could you please expalin me the process or please provide me the document for the same.

Here my question is:

Is there any performance tunning techniques. if yes Please provide any document on the same.?

How to reduse job running time from 60 mins to 45 mins as a generic(already created indexes).

Regards,

Venkatesh.

Super User
Posts: 5,256

How to reduce job running time from 1 hour to 45 mins(Performance tunning)

Sounds like a very specific tuning requirement you've got there Smiley Happy

Well, there are almost as many techniques that are jobs. If you can tune SAS programs, you are likely to tune DI Studio jobs, you just want to do it in an integrated way with the standard transformations.

Here are a couple of papers:

http://support.sas.com/resources/papers/proceedings11/135-2011.pdf

This writtem for 9.1.3, but most of can be applied in later versions:

http://www2.sas.com/proceedings/forum2007/108-2007.pdf

Can just ad that in 9.2 and later, it's to look a performance statistics for job, each step is monitored and can be analyzed via tables/graphs. So it' easy to fins your bottlenecks.

Good luck!

/Linus

Data never sleeps
SAS Employee
Posts: 51

How to reduce job running time from 1 hour to 45 mins(Performance tunning)

Venkatesh,

As a first step, it woudl be good if you could identify which step and the type of processing that is causing the slowdown, but I've provided some general advice below regardless.

The papers provided above are a good reference.  As mentioned by Linus, the first thing to determine is which step in your job is consuming the most runtime.  As  mentioned, the performance graphs that are available as of DI Studio 4.2 can be of great help here.  If you're using an earlier version (3.4), let me know and I can provide other adivice.  These graphs can also help you to determine whether your job is I/O bound (waiting on disk reads/writes) or CPU bound.   A few other general points to consider:

  • Make sure your indexes are actually being used.  You can do this by setting the following option:  option sastrace=",,,d" msglevel="i" sastraceloc=saslog
  • Reduce the number of steps, if possible.  Each step in the job is (potentially) another pass on the data, causing more runtime.
  • If your job contains large joins, see if you can replace one or more joins with the Lookup transformation.  This will load one or more tables into memory and perform a hash lookup, which can be faster than a disk-based join in some cases.  Depending on the size of the lookup tables, you may need to increase the available memory for the SAS session, which you can do by setting the "memsize" parameter in your sasv9.cfg file.
  • Do you have multiple cores/CPUs where the job is running?  Given today's hardware, the answer is almost certainly yesSmiley Happy.  By default, your SAS job is going to run in a single threaded fashion, with the exception of certain PROCs.  To take advantage of multiple CPUs, you can split your data into several chunks, and then using the Loop transform to run your job in parallel on each chunk.  There are several ways of splitting your data (by some logical division like region, or randomly using the mod() function on a unique key).  To accomplish this, you'll need an "inner job" that is parameterized to select a subset of your data and produce unique output tables.  You then invoke this parameterized job using the Loop transformation in an "Outer Job".  The DI Studio help contains an example of using the Loop transformation in this fashion.  When you read the documenation, remember that the table that serves as input to the Loop transformation can be ANY table (or a work table that is the result of a transformation or user written code step) and does not need to be the resulf of the "Library Contents" transformation as shown in the example.  Instead, you would create a table with as many rows as you want parallel jobs, with each row containing whatever parameters the job requires.  I've used this technique in the past to dramatically reduce wall clock time for jobs processing large tables.  If you need additional help with this technique, let me know.

Thanks,

Tim Stearn

Ask a Question
Discussion stats
  • 2 replies
  • 302 views
  • 0 likes
  • 3 in conversation