In my last article, I demonstrated basic parallel CAS action execution. In this post, we'll take it a step further and dynamically drive the parallel execution from a dynamic list, e.g. table read, directory listing, etc..
We'll basically be recreating this example from the SAS doc but we'll be adding a bit to it. For the SAS programmers, this is all very similar to data driven macro. The process will reflect the image below. A CAS Action will be applied in parallel to the values in an input list by looping.
Data Driven Parallel CAS Action Execution
Select any image to see a larger version.
Mobile users: To view the images, select the "Full" version at the bottom of the page.
.
To use some of the required language elements, the entire CASL code block must be stored as a text CASL variable and executed using the runCASL CAS action. Use the Source statement to mark the beginning of the parallel CASL code, and the endSource statement to mark the end of the code.
For SAS macro programmers, the CASL Source, endSource, and runCASL statements function like the %macro, %mend, and the actual macro call itself respectively, while creating the CASL code within the text variable is very similar to generating code using SAS macros.
runCASL Example
.
Within the CASL code block, we'll iterate over a list that drives the parallel CAS actions. We'll call this the "driver list."
The SAS doc example uses a simple static list,
tables = ${iris cars};
Here tables is a CASL dictionary that contains two members, iris and cars. The list feeds some CASL logic that is applied to both values -- One time through for iris and another time through for cars, both in parallel of course.
CASL dictionaries can also be created dynamically using CAS Action Result variables. Below we see a dynamic list of source files in a CASLib DataSource. We'll actually use this list to feed parallel loadTable CAS actions later.
Dynamic List of Source Files in a CASlib
A DO loop then iterates over the CAS actions submitting one for every item in the driver list. For the tables list in the SAS doc example, that looks like this with the dim() function giving the number of items in the list:
do i = 1 to dim(tables);
References to the list items are done using simple array naming with the list name and the iterator, i, in brackets:
fname = tables[i] || '.sashdat';
.
The final piece of the puzzle is the create_parallel_session() function which allows for dynamic creation of CAS sessions. If you remember from the last article, we'll need a separate CAS session for each parallel CAS action. Assign the new session name to the create_parallel_session() function as shown below. Here the name, sc_sess[i] includes the iterator, i, so the spawned sessions will be named sc_sess[1], sc_sess[2], ....
sc_sess[i] = create_parallel_session();
.
Combining all of the components above, a complete data-driven, parallel CASLib DataSource Loader looks like this:
Data Driven Parallel CASLib Load
Data Driven Parallel CASLib Load[/caption] Note that the loop iterates over the items in the list which are files in the CASLib DataSource location in this case, running a load table action for each. The ASYNC and SESSION options on the tableLoad action ensure it will run asynchronously. For more information on this, see my last post. .
So our parallel CAS actions are now off and we hope they work but if we want to know for sure, we still have a bit of work to do. As we saw in the previous article, we'll need to use the wait_for_next_action casl function to get the status of each asynchronous action. Doing that in an automated, data-driven fashion requires more looping. Luckily though, the wait_for_next_action function does a lot of the work for us. When given a 0 parameter, it will wait for any asynchronous action and return the results of the first one that completes. Using this to our advantage, we can just loop on the function until it returns the results of all asynchronous actions.
Looping over the asynchronous action results[/caption]
The job variable receives the execution results/statistics of each asynchronous action which we print to the log:
Wait_for_Next_Action Printout
Fully complete, our code looks like the following. Hopefully this is useful to you.
proc cas;
source pgm;
table.fileinfo result=fI / caslib="public";
do i = 1 to dim(fI.fileInfo[,"name"]);
cfile = fI.fileInfo[i,"name"];
ctable = scan(cfile,1,".");
sc_sess[i] = create_parallel_session();
table.dropTable / table=ctable caslib="public";
table.loadtable session=sc_sess[i] async=cfile /
path=cfile caslib='public'
casOut={name=ctable caslib='public' promote=True};
end;
job = wait_for_next_action(0);
do while(job);
print job;
job = wait_for_next_action(0);
end;
endsource;
sccasl.runCasl / code=pgm;
run;
Find more articles from SAS Global Enablement and Learning here.
That looks promising.
1-on1 I successfully access the oracle files, I load from Oracle and write to Oracle without problems.
Now my challenge is to load all available files in the Public caslib.
It takes a long time to do so.
I thought your code might help me out, but it throws out an error.
I don't know why it changes the session. Probably that's the reason why it complains about the caslib.
And the next question would be:
How can I load only 1000 rows from oracle to caslib Public?
Can I use the where clause with _rowid_?
NOTE: Active Session now MYSESSION. NOTE: Active Session now server. {session=session_0,job=AUD_BIWESTIM,status={severity=2,reason=0,status=The caslib 'ORACASLIB' does not exist in this session.,statusCode=2710120},,logs={ERROR: The caslib 'ORACASLIB' does not exist in this session.,ERROR: The action stopped due to errors.},loglevels={5,5},,}
%if not %sysfunc(clibexist(mySession,oracaslib)) %then %do;
caslib oracaslib datasource=(
srctype="oracle",
uid="SAS",
pwd="xxxxxxxxxx",
path="xxxxxxx.vwfsag.de"
);
%end;
proc cas;
table.fileinfo result=fI / caslib="ORACASLIB";
print fi;
run;
proc cas;
source pgm;
table.fileinfo result=fI / caslib="oracaslib";
do i = 1 to dim(fI.fileInfo[,"name"]);
if fI.fileInfo[i,"Schema"]="SAS" and fI.fileInfo[i,"Type"]="SYNONYM" then do j=1 to 2;
cfile = fI.fileInfo[i,"name"];
ctable = scan(cfile,1,".");
sc_sess[i] = create_parallel_session();
table.dropTable / table=ctable caslib="public" quiet=true;
table.loadtable session=sc_sess[i] async=cfile /
path=cfile caslib='ORACASLIB'
casOut={name=ctable caslib='public' replace=True};
end;
end;
job = wait_for_next_action(0);
do while(job);
print job;
job = wait_for_next_action(0);
end;
endsource;
sccasl.runCasl / code=pgm;
run;
Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!
Data Literacy is for all, even absolute beginners. Jump on board with this free e-learning and boost your career prospects.