SAS Data Integration Studio, DataFlux Data Management Studio, SAS/ACCESS, SAS Data Loader for Hadoop and others

DataFlux Cluster Generation ISsue

Reply
Contributor sht
Contributor
Posts: 22

DataFlux Cluster Generation ISsue

Hi Team,

 

I am facing issues in Dataflux job which are developed for cluster id generation and scheduled thru Windows scheduler.

If I execute those jobs manually then it executes fine(around hrs) but in windows scheduler jobs are taking long time(around 8-9 hrs )

 

Please suggest what can be the issue

OS: Windows8

DF Studio: 8.1

Data size: More than 40lacs record

 

Thank You

SAS Super FREQ
Posts: 102

Re: DataFlux Cluster Generation ISsue

Hi - are you calling the dmpexec command from your scheduler? There are various options you can set when invoking the command that may impact performance, like choosing to write a log or selecting options that overwrite configuration file settings. There are also options for logging that might help you pinpoint the issue.

 

Another thought - I have seen in the past where dmpexec was executed by a user (different than the one used with Data Management Studio) that had different environment settings associated with it (like where the "temp" directory was located) and this impacted performance. 


Ron

Contributor sht
Contributor
Posts: 22

Re: DataFlux Cluster Generation ISsue

Posted in reply to RonAgresta

Hi Ron,

 

Thanks for your reply..

 

I don't execute any dmpexec command.

These schedulers were usually executing smoothly but now it is taking longer time to execute than usual.

previously it get executed in 3hrs now it is taking 7-9 hrs.

 

No.of rows are increased hardly by 2-3 lacs

 

 

Thank you.

 

SAS Super FREQ
Posts: 102

Re: DataFlux Cluster Generation ISsue

Review the product documentation. There are topics that specifically deal with how you should be calling your DMP jobs if you are not running them interactively.

 

Regardless, check a few things:

  • Make sure that each call by your scheduler to run the job didn't spawn a new process that is taking up system resources
  • If your job is accessing data, make sure that your queries are running as you would expect. You can turn on more verbose levels of logging to see database interaction or you can monitor that in your database log.
  • There are logging options you can set that will generate what are called node profile metrics. These logs will tell you how long each node is processing rather than the sum for the entire job. This is also in the documentation.

Ron

Ask a Question
Discussion stats
  • 3 replies
  • 158 views
  • 0 likes
  • 2 in conversation