SAS Data Integration Studio, DataFlux Data Management Studio, SAS/ACCESS, SAS Data Loader for Hadoop and others

DI Studio - How to Run Independent Queries Simultaneously?

Accepted Solution Solved
Reply
Contributor
Posts: 37
Accepted Solution

DI Studio - How to Run Independent Queries Simultaneously?

Suppose a simple DI Studio workflow where you extract subsets of information from disparate data sources (using simple joins with one input).

 

SAS always runs first one join transformation, while the second join, independent of the first, sits waiting.

 

Is there a setting in there anywhere that will allow indedendent operations to run simulatneously?

 


Accepted Solutions
Solution
‎09-05-2017 02:23 PM
SAS Employee
Posts: 6

Re: DI Studio - How to Run Independent Queries Simultaneously?

Just FYI

 

SAS Data Integration Studio 4.901 introduced the Fork transformations for parallelizing flows within a job:

http://support.sas.com/documentation/cdl/en/etlug/69395/HTML/default/viewer.htm#n0gxygc7jonw4mn1ucsi...

 

SAS Data Integration Studio provides a set of macros that are enabled via an option in the job properties:

http://support.sas.com/documentation/cdl/en/etlug/69395/HTML/default/viewer.htm#n1tbcw0rjm4bf1n1ozdy...

These macros are used by the Loop and the Fork transformations. They are available to the user.

View solution in original post


All Replies
Valued Guide
Posts: 947

Re: DI Studio - How to Run Independent Queries Simultaneously?

If all these joins essentially use the same "parent" datasets, then you might be able to use a DATA step to simultaneously create them all.  For instance

 

data boys girls;

   set sashelp.class;

   if sex='M' then output boys;
   else output girls;

run;

 

The savings derives from the fact that the parent dataset (sashelp.class) is read only once.

Contributor
Posts: 37

Re: DI Studio - How to Run Independent Queries Simultaneously?

[ Edited ]

 

Hi @mkeintz

Thanks for the feedback.  Unfortunately the data is pulled from totally disparate data sources.  Sort of like:

 

(1) extract some claims data data

 

(2) extract some policy data

 

(3) extract some customer data

 

(4) join (1) and (2) and (3)

 

It seems like (1), (2), and (3) could take place simultaneously, but DI Studio runs them one at a time.

 

Respected Advisor
Posts: 4,132

Re: DI Studio - How to Run Independent Queries Simultaneously?

[ Edited ]

@JohnJPS

DI Studio just generates SAS code (data steps and Procs) which get executed sequentially. SAS DIS doesn't provide process orchestration OOTB.

As @LinusH suggests either implement the extracts as separate jobs and implement parallelization via scheduling or use the DIS Loop transformation.

 

With the DIS Loop Transformation: You would have to wrap your inner job extract nodes into macros to only execute one extract per iteration of the inner job. You could use a Conditional Start/End Transformation and check for the value of a macro variable which you pass in via control table to the loop transformation.

 

I personally would go for multiple jobs and implement parallelization via scheduling. Having multiple simple jobs makes implementation, testing and operation simpler. If something falls over then it's easy to investigate, fix and re-run. 

Contributor
Posts: 37

Re: DI Studio - How to Run Independent Queries Simultaneously?

Thanks; those two options are on the table, but I was hoping maybe I just missed something in DI Studio. I figure we'll go the road of multiple simultaneous jobs via our enterprise scheduler if necessary.
Respected Advisor
Posts: 4,132

Re: DI Studio - How to Run Independent Queries Simultaneously?

[ Edited ]

@JohnJPS

I believe if you implement the target of your extracts as views and then use these views in the join then things would run simultaneously (as actually the extract logic gets only executed as part of the join).

A SAS SQL can execute multi-threaded so this should allow for some sort of parallelization. In my experience the SAS SQL optimizer doesn't always do a great job with complex joins so if such an approach would increase performance will depend on the actual extract and join logic as well as on the available disk I/O.

 

My design approach is to keep things in separate jobs. I rather go for multiple simple jobs with each job having a single target and only doing one thing than for something "elaborate". I've got always operations in mind so I try to keep dependencies in my jobs to a minimum. If a source table is not available or corrupted then have a simple job fall over and not a complex one as this will make debugging and re-running things much much easier.

Super User
Posts: 5,383

Re: DI Studio - How to Run Independent Queries Simultaneously?

If you feel that the extract jobs takes considerably amount of time you could either have them in separate jobs (and atieing the intermediate extracts to a permanent table), or use the built on parallelixation features of DI Studio (uses MP CONNECT or an LSF Grid depending on your licence).
Data never sleeps
Solution
‎09-05-2017 02:23 PM
SAS Employee
Posts: 6

Re: DI Studio - How to Run Independent Queries Simultaneously?

Just FYI

 

SAS Data Integration Studio 4.901 introduced the Fork transformations for parallelizing flows within a job:

http://support.sas.com/documentation/cdl/en/etlug/69395/HTML/default/viewer.htm#n0gxygc7jonw4mn1ucsi...

 

SAS Data Integration Studio provides a set of macros that are enabled via an option in the job properties:

http://support.sas.com/documentation/cdl/en/etlug/69395/HTML/default/viewer.htm#n1tbcw0rjm4bf1n1ozdy...

These macros are used by the Loop and the Fork transformations. They are available to the user.

Contributor
Posts: 37

Re: DI Studio - How to Run Independent Queries Simultaneously?

Thanks Robert - I was able to use "Fork" effectively on my problem.
☑ This topic is solved.

Need further help from the community? Please ask a new question.

Discussion stats
  • 8 replies
  • 322 views
  • 5 likes
  • 5 in conversation