BookmarkSubscribeRSS Feed

Data Driven Parallel CAS Action Execution

Started ‎08-06-2021 by
Modified ‎08-06-2021 by
Views 4,777

In my last article, I demonstrated basic parallel CAS action execution. In this post, we'll take it a step further and dynamically drive the parallel execution from a dynamic list, e.g. table read, directory listing, etc..

 

We'll basically be recreating this example from the SAS doc but we'll be adding a bit to it. For the SAS programmers, this is all very similar to data driven macro. The process will reflect the image below. A CAS Action will be applied in parallel to the values in an input list by looping. 

 

sf_1_dataDrivenParallelCASL-2.png

Data Driven Parallel CAS Action Execution

Select any image to see a larger version.
Mobile users: To view the images, select the "Full" version at the bottom of the page.

.

 

The runCASL Action

To use some of the required language elements, the entire CASL code block must be stored as a text CASL variable and executed using the runCASL CAS action. Use the Source statement to mark the beginning of the parallel CASL code, and the endSource statement to mark the end of the code.

 

For SAS macro programmers, the CASL Source, endSource, and runCASL statements function like the %macro, %mend, and the actual macro call itself respectively, while creating the CASL code within the text variable is very similar to generating code using SAS macros

sf_2_runCASL.png

runCASL Example

 

.

 

The CASL Dictionary Driver List

Within the CASL code block, we'll iterate over a list that drives the parallel CAS actions. We'll call this the "driver list."

 

The SAS doc example uses a simple static list,

 

tables = ${iris cars};

Here tables is a CASL dictionary that contains two members, iris and cars. The list feeds some CASL logic that is applied to both values -- One time through for iris and another time through for cars, both in parallel of course.

 

CASL dictionaries can also be created dynamically using CAS Action Result variables. Below we see a dynamic list of source files in a CASLib DataSource. We'll actually use this list to feed parallel loadTable CAS actions later.

 

sf_3_dynamicList-1.png

 Dynamic List of Source Files in a CASlib

 

DO Loop Spawning Parallel CAS Actions

A DO loop then iterates over the CAS actions submitting one for every item in the driver list. For the tables list in the SAS doc example, that looks like this with the dim() function giving the number of items in the list:

 

do i = 1 to dim(tables);

References to the list items are done using simple array naming with the list name and the iterator, i, in brackets:

 

fname = tables[i] || '.sashdat';

.

create_parallel_session

The final piece of the puzzle is the create_parallel_session() function which allows for dynamic creation of CAS sessions. If you remember from the last article, we'll need a separate CAS session for each parallel CAS action. Assign the new session name to the create_parallel_session() function as shown below. Here the name, sc_sess[i] includes the iterator, i, so the spawned sessions will be named  sc_sess[1], sc_sess[2], ....

 

sc_sess[i] = create_parallel_session();

.

Putting the Spawning Pieces Together

Combining all of the components above, a complete data-driven, parallel CASLib DataSource Loader looks like this:

 

sf_4_dataDrivenParallelLoad-2.png

 

Data Driven Parallel CASLib Load

 

Data Driven Parallel CASLib Load[/caption] Note that the loop iterates over the items in the list which are files in the CASLib DataSource location in this case, running a load table action for each. The ASYNC and SESSION options on the tableLoad action ensure it will run asynchronously. For more information on this, see my last post. .

 

Getting Execution Status

So our parallel CAS actions are now off and we hope they work but if we want to know for sure, we still have a bit of work to do. As we saw in the previous article, we'll need to use the wait_for_next_action casl function to get the status of each asynchronous action. Doing that in an automated, data-driven fashion requires more looping. Luckily though, the wait_for_next_action function does a lot of the work for us. When given a 0 parameter, it will wait for any asynchronous action and return the results of the first one that completes. Using this to our advantage, we can just loop on the function until it returns the results of all asynchronous actions.

 

 

sf_5_waitForAction-300x97.png

 

Looping over the asynchronous action results[/caption]

 

The job variable receives the execution results/statistics of each asynchronous action which we print to the log:

sf_^_waitForActionLog-1024x147.png

 

Wait_for_Next_Action Printout

 

The Complete Code

Fully complete, our code looks like the following. Hopefully this is useful to you.

 

proc cas;
  source pgm;
    table.fileinfo result=fI / caslib="public";
    do i = 1 to dim(fI.fileInfo[,"name"]);
      cfile =  fI.fileInfo[i,"name"];
      ctable = scan(cfile,1,".");
      sc_sess[i] = create_parallel_session();
      table.dropTable / table=ctable caslib="public";
      table.loadtable session=sc_sess[i] async=cfile  /            
        path=cfile caslib='public'
        casOut={name=ctable caslib='public' promote=True};
    end;

    job = wait_for_next_action(0);
    do while(job); 
          print job;
          job = wait_for_next_action(0);
    end;

  endsource;

  sccasl.runCasl / code=pgm;
run;

 

Find more articles from SAS Global Enablement and Learning here.

Comments

That looks promising.

1-on1 I successfully access the oracle files, I load from Oracle and write to Oracle without problems. 

 

Now my challenge is to load all available files in the Public caslib. 

It takes a long time to do so. 

I thought your code might help me out, but it throws out an error.

I don't know why it changes the session. Probably that's the reason why it complains about the caslib. 

 

And the next question would be:

How can I load only 1000 rows from oracle to caslib Public?

Can I use the where clause with _rowid_?

 

NOTE: Active Session now MYSESSION.
NOTE: Active Session now server.
{session=session_0,job=AUD_BIWESTIM,status={severity=2,reason=0,status=The caslib 'ORACASLIB' does not exist in this 
session.,statusCode=2710120},,logs={ERROR: The caslib 'ORACASLIB' does not exist in this session.,ERROR: The action stopped due to 
errors.},loglevels={5,5},,}

 

%if not %sysfunc(clibexist(mySession,oracaslib)) %then %do;
  caslib oracaslib datasource=(                                          
              srctype="oracle",
              uid="SAS",
              pwd="xxxxxxxxxx",
              path="xxxxxxx.vwfsag.de" 
);
%end;
proc cas;
    table.fileinfo result=fI / caslib="ORACASLIB";
print fi;
run;

proc cas;
  source pgm;
    table.fileinfo result=fI / caslib="oracaslib";
    do i = 1 to dim(fI.fileInfo[,"name"]);
      if fI.fileInfo[i,"Schema"]="SAS" and fI.fileInfo[i,"Type"]="SYNONYM" then do j=1 to 2;
        cfile =  fI.fileInfo[i,"name"];
        ctable = scan(cfile,1,".");
        sc_sess[i] = create_parallel_session();
        table.dropTable / table=ctable caslib="public" quiet=true;
        table.loadtable session=sc_sess[i] async=cfile  /            
          path=cfile caslib='ORACASLIB'
          casOut={name=ctable caslib='public' replace=True};
      end;
    end;

    job = wait_for_next_action(0);
    do while(job); 
          print job;
          job = wait_for_next_action(0);
    end;

  endsource;

  sccasl.runCasl / code=pgm;
run;
Version history
Last update:
‎08-06-2021 02:06 PM
Updated by:
Contributors

SAS Innovate 2025: Register Now

Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!

Free course: Data Literacy Essentials

Data Literacy is for all, even absolute beginners. Jump on board with this free e-learning  and boost your career prospects.

Get Started

Article Labels
Article Tags