BookmarkSubscribeRSS Feed
☑ This topic is solved. Need further help from the community? Please sign in and ask a new question.
Shakti_Sourav
Quartz | Level 8

Dear Team,

I have created a Data job having 6 crores record in first data source transformation which is taking more than 24 hours to complete in SAS Data Management Studio.

I want to reduce the execution time, Please suggest how to increase the execution time?

 

 

I have attached one screenshot for your references.

Shakti_Sourav_0-1653373503478.png

 

Let me know, If required any further details.

Thank You

Shakti 

 

 

1 ACCEPTED SOLUTION

Accepted Solutions
SASKiwi
PROC Star

Where is the error happening? In the final Data Insert job? How long is it running for before you get the error? If the Oracle connection in DataFlux Studio is working OK, but times out when you run the same job on the DQ server then there must be something different about the database connection. I'm guessing there might be a firewall timeout rule on the server blocking the database connection. Talk to your SAS administrator to check this out and get it changed.  

View solution in original post

7 REPLIES 7
ErinW
SAS Employee

Hey Shakti!

 

I have two suggestions for you. First of all, you don't need the Branch node in your job flow. I don't know how much it will speed things up to remove it, but it can't hurt. The Branch node simply copies its input to one or more outputs. Since you're only using the Branch node to connect to one output, you can get the same result and skip a node if you connect the Surviving Record Identification Node to Data Validation 2 directly.

 

The other suggestion: do you have a DataFlux Data Management Server? Remember that DataFlux Data Management Studio is a client application, and is not really meant to process huge amounts of data. Ask your admin if you have a server, and if you do, here are some links on how to get access to it and deploy your jobs to it:

Connecting to Data Management Servers (sas.com)

Deploying Jobs to a DataFlux Data Management Server (sas.com)

 

A real server should run this much faster. Pay attention to the pre-requisites page: if you deploy a job to the server, you have to make sure it has everything it needs to run. The server might need copies of your business rules, macro variables, and QKB, among other things. 

 

Let me know if that helps. 

ErinW
SAS Employee

One more suggestion looking at the job again-you might not need the Standardization node. When you generate match codes, a standardization is automatically applied. So if you're using the Standardization to clean up values before they are used to generate match codes, that's unnecessary. (Unless you want a cleaned version AND a match code.)

Shakti_Sourav
Quartz | Level 8

Dear ErinW,

Thanks for your response !!!

 

as per your suggestion, i have executed it. but i got another error ( ORACLE CONNECTION DEAD ).

 

Please suggest, How to resolve oracle connection dead ?

 

Thank You 

Shakti

SASKiwi
PROC Star

Where is the error happening? In the final Data Insert job? How long is it running for before you get the error? If the Oracle connection in DataFlux Studio is working OK, but times out when you run the same job on the DQ server then there must be something different about the database connection. I'm guessing there might be a firewall timeout rule on the server blocking the database connection. Talk to your SAS administrator to check this out and get it changed.  

SASKiwi
PROC Star

In my experience match-coding is the most resource intensive process in DataFlux. 60 million rows is a lot of data and I suspect it is going to be slow regardless of whether you run it on your DataFlux server or not.

 

What I do is split my DataFlux data into chunks and run the match-coding in parallel. For example, if you split the 60 million rows into 4 15 million row tables, then run 4 match-coding jobs at the same time it will run in one quarter of the time. 

 

 

Shakti_Sourav
Quartz | Level 8

I have tried run the jobs in parallel. but it's taking 18 hours to execute.

 

I have attached one document file which is contains steps of parallel processing.

please find it and let me know if changes required.

Shakti_Sourav
Quartz | Level 8

as per your suggestion, i have tried it. I got same time ton execute. please suggest me how to reduce execution time and let me know you want any details, i will update it.

 

Many times i got Network timed out, and Network team confirmed there is no firewall. please confirmed for this.

 

Thank you

shakti

Ready to join fellow brilliant minds for the SAS Hackathon?

Build your skills. Make connections. Enjoy creative freedom. Maybe change the world. Registration is now open through August 30th. Visit the SAS Hackathon homepage.

Register today!
How to connect to databases in SAS Viya

Need to connect to databases in SAS Viya? SAS’ David Ghan shows you two methods – via SAS/ACCESS LIBNAME and SAS Data Connector SASLIBS – in this video.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 7 replies
  • 1292 views
  • 1 like
  • 3 in conversation