BookmarkSubscribeRSS Feed
SAS_1001
Quartz | Level 8

I'm trying to find some code to use in SAS EG to run for checkpoint and restart for testing high availability ?

 

Thanks in advance !

4 REPLIES 4
doug_sas
SAS Employee

Checkpoint/restart of SAS code via the grid is only supported through batch processing via SASGSUB. SASGSUB's GRIDRESTARTOK (for data-step checkpoint restart) and GRIDLRESTARTOK (for label checkpoint restart) cause the batch job to put its checkpoint/restart information onto a shared directory so when the job restarts on another machine, it can find the checkpoint information.

 

EG creates an interactive connection to a grid-launched workspace server. The EG user may create SAS code that uses grid-enabled SIGNONs to take advantage of grid hosts for parallel processing, but those connections are also interactive. When code is submitted via EG, it runs on the workspace server - no new grid job is started to process the SAS job. Interactive grid jobs like grid-launched workspace servers and grid-launched CONNECT servers (a.k.a., grid-servers) cannot use checkpoint restart because once the connection is lost so is all of the data.

There is also the Distribute macro that can manage individual tasks out onto grid servers via CONNECT SIGNON/RSUBMITs. It manages the distribution of tasks to available grid servers and will restart a task if the host it is currently running on dies.

 

JackHamilton
Lapis Lazuli | Level 10

What about gridded jobs generated by PROC SCAPROC?  Does SCAPROC handle failed sessions correctly, or does only the DISTRIBUTE macro do that?  There's no documentation for the startup, taskwait, getsession, and shutdown statements in SCAPROC, so it's hard to tell exactly how errors might be handled.

doug_sas
SAS Employee

PROC SCAPROC will generate code that tries to use SAS/CONNECT grid-enabled SIGNONs. When it runs, it will do a good job of sending the next task to an available grid server, but if that grid server dies, the RSUBMIT'd code is lost and no recovery is attempted.

 

It should be noted that PROC SCAPROC and the Distribute macro have different purposes:


The Distribute macro (found in the Grid Toolbox) is meant to manage the SIGNON/RSUBMIT of independent 'tasks' to one or more CONNECT servers (usually grid servers, but it will spawn servers locally too). It requires code to be written in a specific form (called a 'task') that may be difficult to accomplish for some processing. It is best where the task being executed is the same on multiple servers with just data or parameters being different from one task to the next. Since everything is an independent task, it has the ability to restart a task if the host dies and the task processing fails.

 

PROC SCAPROC is meant to run SAS programs to analyze data access/ data processing patterns and try to make rewrite the code to make it run in parallel. It will insert macros that use SAS/CONNECT's grid-enabled SIGNON/RSUBMIT when data access/ data processing can be done in parallel. It will get code most of the way to parallel processing, but can be fooled by heavy use of macros that hide data access / data processing. Because the original SAS program is sequential, the parallel processing done by the output of PROC SCAPROC must be executed in a specific order which may not lend itself to retrying.

JackHamilton
Lapis Lazuli | Level 10

What I would like is a set of supported macros with documentation and working examples that combines the features of the waitForAvailableSession macro taught in class with the more robust error detection of the gridDistribute macro.  

 

Something like:

 

/* Setup stuff of some kind */

 

%gridRunNextCode()

rsubmit &OPTIONS_SET_BY_RUNCODE_MACRO.;

    /* A piece of code goes here */

endrsubmit;

 

%gridRunNextCode()

rsubmit &OPTIONS_SET_BY_RUNCODE_MACRO.;

    /* A different piece of code goes here */

endrsubmit;

 

/* Shutdown code goes here */

 

The Distribute macro examples that I have seen all run the same piece of code over and over, but what I usually want to do is to run independent different pieces of code in parallel.  

 

suga badge.PNGThe SAS Users Group for Administrators (SUGA) is open to all SAS administrators and architects who install, update, manage or maintain a SAS deployment. 

Join SUGA 

Get Started with SAS Information Catalog in SAS Viya

SAS technical trainer Erin Winters shows you how to explore assets, create new data discovery agents, schedule data discovery agents, and much more.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 4 replies
  • 1310 views
  • 0 likes
  • 3 in conversation