I have a DI studio job that is as below
sourcetableA -> transform1 -> temptable1 -> transform2 -> temptable2 -> transform3 -> temptable3 -> tableloader -> tragettable
Lets say sourcetable has 10,000 records . Now lets assume in transform1 1000 was processed first. I want this 1000 to proceed further to transform2 rather than wait for all 10,000 to finish tranform1 and reach in temptable1.
How can this be achieved? Right now when I check the log I see that the way it works is it waits for all 10,000 to reach at each stage before it proceeds with the next.
There is not to my knowledge any direct support for pipeline parallelism in DU Studio. I guess that you have to code it yourself, which I think will be a bit tricky if you want to us it together with standard DI Studio transformations.
An easier way to achieve almost the same is to use views wherever it's possible.
what you want is normally achieved by using temporary tables of type "view" rather than "data". How much pipelining can be achieved depends on the complexity of the transformation steps. For example, if one transformation is a transpose, the step will probably have to read the whole table before it starts writing out data. Similarly, a PROC SORT won't write its out= data until it has read all the data. Sometimes what seem like really complex steps, like joining two or more tables, will "pipeline" really well.