I have an application that currently makes 6 calls to the same macro, changing the filter criteria on each call and creating data sets that are used later in the application. Each call is independent of the previous call. This works fine but it takes a long time to complete.all 6 calls. (There is also another section that replicates a process 23x - all independent)
I would like to speed this process up by running these six calls in parallel. (Running the application 6x will not works as there are other parts of the application that will eventually use the data created ). We do not have SAS Connect licensed. I know I've seen papers, discussions on the best way to do this (without using SAS Connect) but had no use for it at the time, now I do and can't find what I'm looking for.
Could some one point me in the right direction?
We are using SAS 9.2. All apps currently are submitted as a macro call in enterprise guide. Eventually want to make it a stored procedure of some sort.
My immediate reaction was to use separate jobs and job controller software.
My second thought was that parallel processing may not gain anything if the process is I/O bound and you are using data on a common disk drive (and thus, the bottleneck is not the processor). In your SAS log, is there a large difference between the CPU time and the total time? That is usually an indication that you are I/O bound. If that is the case, you might get a better performance increase by re-writing the macro to process the criteria on one pass of the input data.
I think I need to elaborate a bit more...... or perhaps parallel processing is the wrong name for what I want to do.
The applications actually has 3 main components: Build historical data,
Forecast, Build Forecasted data (which we write to an Oracale and Teradata database)
The Build step could actually be thought of as 6 separate jobs (which build 23 data sets) and the forecast step could actually be broken into 23 jobs (23 Forecast Studio Projects which are updated monthly with new data). The last step writes the forecasts out to Oracle/Teradata and is only run after we validate the Forecasts.
We are running on a server with 4 cores with hyperthreading. Theoretically, we could have 8 jobs running simultaneously. We have done testing and we don't see a lot of performance issues. Currently the build step takes 2-3 hours. By running, simultaneously, we can cut it to a little over an hour (The longest any one build takes). Similiarly, we can cut out a substantial amout of the forecasting step by running several steps simultaneously (Probably at least an hour).
This application has been running for 3 years as a single job - with the primary complaint that it takes too long. What I want to do is allow the user to continue to submit the job (as a single job) and have the code take care of running the different parts simultaneously. In essence, this is the same as using separate jobs.