BookmarkSubscribeRSS Feed

CASL: Parallel CAS Action Execution

Started ‎07-08-2021 by
Modified ‎07-08-2021 by
Views 6,107

Do you like running independent processes in parallel? I know I do. Say you need to load a 10 dimension table star schema. Why not load all 10 at the same time? Say you need to run 15 reports. Why not run them all at the same time? Say you need to summarize 5 tables as input to a complex DATA Step. Why not summarize them all at the same time? We can do this in base SAS with MP Connect; We can do it in DI Studio; And we can do it in CASL too!

 

We need a few puzzle pieces to make parallel CAS action execution a reality. Let's look at those here and build up to parallel execution....

 

Puzzle Piece 1: Multiple CAS Sessions

The first thing we need for parallel CAS actions is multiple CAS sessions.

 

To run CAS actions in parallel, each must run in its own CAS session. If we want to run 8 CAS actions in parallel, we'll need 8 CAS sessions. 9 CAS actions? Then 9 sessions. 10 CAS actions? Then 10 sessions....

 

How do we create multiple CAS sessions? With multiple CAS statements of course. Below, we see an example creating two sessions, casSession1, casSession2 that are meant for two parallel CAS actions.

 

sf_1_asyncSession-300x33.png

 

Starting two CAS Sessions

Select any image to see a larger version.
Mobile users: To view the images, select the "Full" version at the bottom of the page.

 

Puzzle Piece 2: ASYNC Option

The next thing we need is a mechanism to run CAS actions asynchronously. For example, when we issue a table.loadTable action, we need the program to move to the next step instead of waiting for that load to finish. Just submitting our CAS actions to different sessions will not, by itself, cause them to run asynchronously. We need to specify the ASYNC CAS Action option on any action we want to run asynchronously. Without the option, the program will wait for the CAS action to finish and not move on until it does.

 

Below we see some CASL code that calls two CAS actions asynchronously. Each action runs in its own session and each has the ASYNC option specified. Note that the ASYNC option requires a result name ("sess1" and "sess2" below). We'll see how this is utilized in the next section.

 

sf_2_async-1.png

 

Two CAS Actions Coded to Run in Parallel

 

Puzzle Piece #3: Wait for It!

The final thing we need is a mechanism to make the program wait for the asynchronous actions to complete and to return the results of the asynchronous CAS actions to the CASL program. For example, we'd need to know when our (parallel) table loads finished, and if they finished successfully, before we could build our STAR schema view on top of them.

 

The WAIT_FOR_NEXT_ACTION CASL function gives us this capability. It both waits for the referenced CAS action to finish and provides a mechanism to retrieve the CAS action status and other information.

 

The asynchronous CAS actions are referenced by the result name ("sess1" and "sess2") on the ASYNC option. The status and other information from the asynchronous CAS action execution is retrieved by referencing the name ("job1" and "job2") assigned to each of the WAIT_FOR_NEXT_ACTION calls. 

 

sf_3_waitForJob-300x128.png

 

Using the WAIT_FOR_NEXT_JOB function

 

Putting it all Together

Below is the complete program whose pieces are shown above, including some data manipulation I did to increase the size of the input table so that the example would run longer making parallel execution easier to see.

 


cas casSession1 sessopts=(metrics=true);
cas casSession2 sessopts=(metrics=true);


caslib _all_ assign;

data visual.mg20 (promote=yes);
set visual.mega_corp visual.mega_corp visual.mega_corp visual.mega_corp visual.mega_corp
    visual.mega_corp visual.mega_corp visual.mega_corp visual.mega_corp visual.mega_corp
    visual.mega_corp visual.mega_corp visual.mega_corp visual.mega_corp visual.mega_corp
    visual.mega_corp visual.mega_corp visual.mega_corp visual.mega_corp visual.mega_corp;
run;

proc cas ;

   simple.groupByInfo session="casSession1" result=r status=s async="sess1"/
      table={caslib="visual",name="mg20",
         groupBy={"dayofweek","facility","product","unit"}}
      casOut={caslib="visual",name="mc_gbinfo1",replace=true,replication=0}
      includeDuplicates=false
      groupbylimit=20000
      details=true ;
   run ;

   simple.groupByInfo session="casSession2" result=r status=s async="sess2"/
      table={caslib="visual",name="mg20",
         groupBy={"dayofweek","facility","product","unit"}}
      casOut={caslib="visual",name="mc_gbinfo2",replace=true,replication=0}
     includeDuplicates=false
      groupbylimit=20000
      details=true ;
   run;

   job1=wait_for_next_action("sess1");
   job2=wait_for_next_action("sess2");

   print job1;
   print job2;
   print job1.session;
   print job1.status.severity;

quit ;

cas casSession1 terminate;
cas casSession2 terminate;

.

Examining the log, we can see how to retrieve the execution information (e.g. Print Job1, Print Job1.status.serverity, etc.) as well as proof that the CAS actions ran in parallel by comparing the real time values. . 

 

sf_4_asyncLog.png

 

Log Highlights

 

More to Come

In future posts, we'll dive into iteratively running parallel CAS actions as well as running more complex CASL programs with parallel execution. .>

Find more articles from SAS Global Enablement and Learning here.

Version history
Last update:
‎07-08-2021 09:09 AM
Updated by:
Contributors

Ready to join fellow brilliant minds for the SAS Hackathon?

Build your skills. Make connections. Enjoy creative freedom. Maybe change the world. Registration is now open through August 30th. Visit the SAS Hackathon homepage.

Register today!

Free course: Data Literacy Essentials

Data Literacy is for all, even absolute beginners. Jump on board with this free e-learning  and boost your career prospects.

Get Started

Article Labels
Article Tags