ChrisC, I don't know if you're still looking for an answer to this, but the previous post were on target. To answer in a bit more detail: The first step in creating jobs that use the Loop transformation is to create an "Inner Job". I see that you've done that (I assume that's your 1st screen shot in your post). After you've gotten your inner job running and tested with some sample data, you need to add parameters to the job. You do this by accessing the parameter tab in the job Properties. Any parameter you create results in a macro variable that will be set by Loop before the job runs. You need to use these macro variables in your inner job. In your case, you want to process a group of tables. This tells me that you need to parameterize at least one location, perhaps two: The name of the input table must be parameterized. To do this, you would define a parameter (possible called "inTbl"), open the properties of the input table and go to the "Physical Storage" tab and replace the value for "Physical name" with a reference to the macro variable for the parameter: &inTbl. This will allow the inner job to process a different input table each time it runs, with Loop passing in the value for the table name. You may need to do something similar for any output tables - it depends on what you're trying to accomplish. If you're attempting to create a unique output table for each input table, you'd want to parameterize it as described above. If you're trying to create a single table from all inputs, you wouldn't parameterize, but would instead set the "Load Style" of the Table Loader to "Append". If this is what you're doing, there are implications in how you set parameters for the Loop transformation, depending on the target type of your table. Once you've done the 3 steps above, you're ready to create your "Outer Job", which will use Loop. From what I can see in the screen shot, you've done this correctly at a macro level. The main thing with the steps that precede loop is that you need to produce a table that has all of the required parameters that must be passed to the "Inner" job. In your case, the name of each input table would certainly be one parameter, but perhaps you also need others. All the columns in the table that is input to Loop are potential parameters that can be passed to the inner job. All rows will result in an execution of the inner job by the loop transformation. Next you need to perform settings on two tabs in the Loop transformation Parameter Mapping: Map columns in the input table to Loop to parameters defined in the Inner job. In your case, at least "inTbl" mapping would be required. Edit Options on the "Loop Options" screen: The main decision to make here whether and how to use parallel processing. By default, all iterations will execute sequentially. You can choose, however, to execute some or all of the iterations in parallel. There are several considerations and settings: If you want to execute in parallel, you have to make sure there will be no locking issues in your inner job. Primarily, this means that you can't try to write to the same physical table in two concurrent iterations unless the table is stored in a database that allows parallel writes. As I mentioned above, if you're trying to create a different output table for each input, you could parameterize the output table name, which would have each parallel iteration writing to a different table, eliminating any write contention. If, however, you're trying to append to the same table with each iteration, you'd have to be writing to a relational database like Oracle, which allows parallel writes to the same table If you're writing to a SAS dataset, you'd need to run all iterations sequentially, since without SAS/SHARE, a process writing to a table will lock the whole table and all other processes would fail. There are patterns for dealing with this even with SAS tables if speed is important. The considerations in the bullet point above are the most important to get correct. In addition to making sure you avoid write contention: You have to decide how many jobs to run in parallel. The option "One process for each available CPU node" is a good one when just starting with this transform. If you're using Platform scheduler or another advanced scheduler, other options can be considered - let me know if you'd like to learn more. You need to provide a directory where each parallel job will write its logs - this is the "Location on host for log and output files" setting. The job automatically names each log file with a unique (though not intuitive) name like L37.log. You can use PROC PRINTTO in your inner job to redirect to a more friendly log name if necessary. That's probably a lot to take in, so feel free to post follow up questions. Loop is one of the more useful and versatile transformations in DI Studio, so while there is a bit of a learning curve, the payoff is worth it in the number of ways you'll find to use this pattern. Thanks, Tim Stearn
... View more