Sometimes, your Data Management jobs will not run in the expected time, or you need to understand where the time is spent in your job. Performance Profiling exists for this.
There are two modes of operation:
The output will be comma delimited and the columns are as follows for type 1 above:
Example output for 1 (action type, node type, milliseconds, entries):
Profiler dump node types NX,CASE,49698,1000001
NX,CLUSTER,63895,910332
NX,DEST_DELIMTEXT,131,6871
NX,EXPRESSION,11860,917203
NX,MATCHCODE,6105360,1000001
NX,MATCHPARSED,530428,3000003
NX,SORT,10851,910332
NX,SOURCE_DELIMTEXT,30970,1000001
NX,STANDARDIZE,1332685,1000001
PR,CASE,0,1
PR,CLUSTER,0,1
PR,DEST_DELIMTEXT,0,1
PR,EXPRESSION,52,2
PR,MATCHCODE,3883,1
PR,MATCHPARSED,0,3
PR,SORT,0,1
PR,SOURCE_DELIMTEXT,27,1
PR,STANDARDIZE,0,1
PX,CASE,0,1
PX,CLUSTER,253,1
PX,DEST_DELIMTEXT,17,1
PX,EXPRESSION,10,2
PX,MATCHCODE,0,1
PX,MATCHPARSED,0,3
PX,SORT,1,1
PX,SOURCE_DELIMTEXT,0,1
PX,STANDARDIZE,20,1
For type 2:
Example output for 2 (action type, job name, instance, milliseconds, entries):
Profiler dump node instances
NX,HDP_Duplicates_1.7_ServerTestProd27_1M.ddf,15,1331014,1000001
NX,HDP_Duplicates_1.7_ServerTestProd27_1M.ddf,17,207743,1000001
NX,HDP_Duplicates_1.7_ServerTestProd27_1M.ddf,18,155413,1000001
NX,HDP_Duplicates_1.7_ServerTestProd27_1M.ddf,19,171002,1000001
NX,HDP_Duplicates_1.7_ServerTestProd27_1M.ddf,33,131,6871
NX,HDP_Duplicates_1.7_ServerTestProd27_1M.ddf,34,10789,910332
NX,HDP_Duplicates_1.7_ServerTestProd27_1M.ddf,35,4103,6871
NX,HDP_Duplicates_1.7_ServerTestProd27_1M.ddf,36,8063,910332
NX,HDP_Duplicates_1.7_ServerTestProd27_1M.ddf,38,30698,1000001
NX,HDP_Duplicates_1.7_ServerTestProd27_1M.ddf,4,6102637,1000001
NX,HDP_Duplicates_1.7_ServerTestProd27_1M.ddf,5,62961,910332
NX,HDP_Duplicates_1.7_ServerTestProd27_1M.ddf,9,51321,1000001
PR,HDP_Duplicates_1.7_ServerTestProd27_1M.ddf,15,0,1
PR,HDP_Duplicates_1.7_ServerTestProd27_1M.ddf,17,0,1
PR,HDP_Duplicates_1.7_ServerTestProd27_1M.ddf,18,0,1
PR,HDP_Duplicates_1.7_ServerTestProd27_1M.ddf,19,0,1
PR,HDP_Duplicates_1.7_ServerTestProd27_1M.ddf,33,0,1
PR,HDP_Duplicates_1.7_ServerTestProd27_1M.ddf,34,0,1
PR,HDP_Duplicates_1.7_ServerTestProd27_1M.ddf,35,3,1
PR,HDP_Duplicates_1.7_ServerTestProd27_1M.ddf,36,48,1
PR,HDP_Duplicates_1.7_ServerTestProd27_1M.ddf,38,26,1
PR,HDP_Duplicates_1.7_ServerTestProd27_1M.ddf,4,3883,1
PR,HDP_Duplicates_1.7_ServerTestProd27_1M.ddf,5,0,1
PR,HDP_Duplicates_1.7_ServerTestProd27_1M.ddf,9,0,1
PX,HDP_Duplicates_1.7_ServerTestProd27_1M.ddf,15,20,1
PX,HDP_Duplicates_1.7_ServerTestProd27_1M.ddf,17,0,1
PX,HDP_Duplicates_1.7_ServerTestProd27_1M.ddf,18,0,1
PX,HDP_Duplicates_1.7_ServerTestProd27_1M.ddf,19,0,1
PX,HDP_Duplicates_1.7_ServerTestProd27_1M.ddf,33,17,1
PX,HDP_Duplicates_1.7_ServerTestProd27_1M.ddf,34,1,1
PX,HDP_Duplicates_1.7_ServerTestProd27_1M.ddf,35,10,1
PX,HDP_Duplicates_1.7_ServerTestProd27_1M.ddf,36,0,1
PX,HDP_Duplicates_1.7_ServerTestProd27_1M.ddf,38,0,1
PX,HDP_Duplicates_1.7_ServerTestProd27_1M.ddf,4,0,1
PX,HDP_Duplicates_1.7_ServerTestProd27_1M.ddf,5,252,1
PX,HDP_Duplicates_1.7_ServerTestProd27_1M.ddf,9,0,1
To turn on the functionality, go into your app.cfg and add one or both of the following two lines, depending on which log you want to enable:
STEPENG/PROFILEBYNODE=true
STEPENG/PROFILEBYTYPE=true
On Windows and versions superior to 2.4 only, also set the following:
BASE/HIGH_PRECISION_TIMER=1
If not set, the timings might be less accurate. However, if this IS set, all threads will run on CPU core 1 which will decrease the performance of the program if it's using multiple threads:
Then you’ll need to make sure you are seeing the log output. So in your log xml file, add something like this:
<appender class="RollingFileAppender" name="ProfFile">
<param name="Append" value="false"/>
<param name="ImmediateFlush" value="true"/>
<rollingPolicy class="TimeBasedRollingPolicy">
<param name="fileNamePattern"
value="%S{OSENV.DIS_LOG_DIR}/profiler_service_%d_%S{pid}.log"/>
</rollingPolicy>
<layout>
<param name="ConversionPattern" value="%m"/>
</layout>
</appender>
<logger name="DF.RTProfiler">
<level value="Trace"/>
<appender-ref ref="ProfFile"/>
</logger>
The logs are pretty low volume so they shouldn’t affect performance or disk space.
The output is comma delimited so you can pull it into excel. In order to analyse it, you can use Excel Pivot Table, which requires to have all fields splitted.
Input | Operation | Node Type | Time | Entries |
NX,CASE,49698,1000001 | NX | CASE | 49698 | 1000001 |
NX,CLUSTER,63895,910332 | NX | CLUSTER | 63895 | 910332 |
NX,DEST_DELIMTEXT,131,6871 | NX | DEST_DELIMTEXT | 131 | 6871 |
NX,EXPRESSION,11860,917203 | NX | EXPRESSION | 11860 | 917203 |
NX,MATCHCODE,6105360,1000001 | NX | MATCHCODE | 6105360 | 1000001 |
NX,MATCHPARSED,530428,3000003 | NX | MATCHPARSED | 530428 | 3000003 |
NX,SORT,10851,910332 | NX | SORT | 10851 | 910332 |
NX,SOURCE_DELIMTEXT,30970,1000001 | NX | SOURCE_DELIMTEXT | 30970 | 1000001 |
NX,STANDARDIZE,1332685,1000001 | NX | STANDARDIZE | 1332685 | 1000001 |
PR,CASE,0,1 | PR | CASE | 0 | 1 |
PR,CLUSTER,0,1 | PR | CLUSTER | 0 | 1 |
PR,DEST_DELIMTEXT,0,1 | PR | DEST_DELIMTEXT | 0 | 1 |
PR,EXPRESSION,52,2 | PR | EXPRESSION | 52 | 2 |
PR,MATCHCODE,3883,1 | PR | MATCHCODE | 3883 | 1 |
PR,MATCHPARSED,0,3 | PR | MATCHPARSED | 0 | 3 |
PR,SORT,0,1 | PR | SORT | 0 | 1 |
PR,SOURCE_DELIMTEXT,27,1 | PR | SOURCE_DELIMTEXT | 27 | 1 |
PR,STANDARDIZE,0,1 | PR | STANDARDIZE | 0 | 1 |
PX,CASE,0,1 | PX | CASE | 0 | 1 |
PX,CLUSTER,253,1 | PX | CLUSTER | 253 | 1 |
PX,DEST_DELIMTEXT,17,1 | PX | DEST_DELIMTEXT | 17 | 1 |
PX,EXPRESSION,10,2 | PX | EXPRESSION | 10 | 2 |
PX,MATCHCODE,0,1 | PX | MATCHCODE | 0 | 1 |
PX,MATCHPARSED,0,3 | PX | MATCHPARSED | 0 | 3 |
PX,SORT,1,1 | PX | SORT | 1 | 1 |
PX,SOURCE_DELIMTEXT,0,1 | PX | SOURCE_DELIMTEXT | 0 | 1 |
PX,STANDARDIZE,20,1 | PX | STANDARDIZE | 20 | 1 |
You can then pull out information of this type:
Total Time Spent on DEV, PROD and difference, by node type
Percentage of Job's Time Spent on DEV, PROD, by Node Type
Join us for SAS Innovate 2025, our biggest and most exciting global event of the year, in Orlando, FL, from May 6-9. Sign up by March 14 for just $795.
Data Literacy is for all, even absolute beginners. Jump on board with this free e-learning and boost your career prospects.