BookmarkSubscribeRSS Feed
dgower
Obsidian | Level 7

We have our GRID nodes running on Windows (I know).  So they need to be rebooted periodically.  What is the best way to handle running jobs when we need to restart the servers?

6 REPLIES 6
ChrisHemedinger
Community Manager

Moved this to since I don't think it's specific to ITRM, and there are more folks watching that group that can answer.

Chris

It's time to register for SAS Innovate! Join your SAS user peers in Las Vegas on April 16-19 2024.
jakarman
Barite | Level 11

There is no need to reboot the Windows machines. You can have them running for a long period.

Problems are often within the "apolications" like sas.  The cause can be memory problems or synchronisation issues. The real solution is a developers question, in this case sas TS and the developers of the SAS system.

Needing to bypass issues in SAS you could plan to restart all SAS servers better word services. Eg the metadata server. In that case your batch processes will not be affected.

Needing a planned outage of the os you can plan that so cancelling running jobs is an expected event

---->-- ja karman --<-----
dgower
Obsidian | Level 7

Both, we have scheduled batch jobs and users running jobs interactively.  So I'm wondering if there's a way, or a best practice, to stop processing new jobs say 30 minutes prior to bouncing the servers and a way to "gracefully" stop existing jobs immediately prior to restarting the servers.  Thanks for your reply.

Kurt_Bremser
Super User

Batch jobs should always be written in a way that allows them to crash or be stopped unexpectedly, and be rerun without causing damage to data. Eg new observations added to a table should "know" which run added them, and a repeat of that run can filter them out before repeating the table update.

With interactive sessions you can't really know what timespan is right. Some jobs take seconds, some literally days.

In that context I'd like to see a tool that allows a SAS administrator to send messages to metadata-driven clients like EG.

Right now, one has to develop methods to do that outside of SAS or with the use of external commands (like running a ps on UNIX that finds the workspace servers, deducts the userid's, finds the email adresses of those and sends email that the server will be going down).

jakarman
Barite | Level 11

Kurt  eguide with grid and parallel code submission is not an interactive only approach anymore. It is more doing batch work.  That is the flow processing in eguide and studio offering. Batch processing by selfservice.

There is an advanced topic for dower to think about. That is checkpoint restart in SAS. by that you should be able to cancel long running jobs. The topic is an advanced one with a lot of pre reqs. The only event I have seen checkpoint restart being used is with mainframe job scheduling having jobs for several days to run.

---->-- ja karman --<-----

suga badge.PNGThe SAS Users Group for Administrators (SUGA) is open to all SAS administrators and architects who install, update, manage or maintain a SAS deployment. 

Join SUGA 

CLI in SAS Viya

Learn how to install the SAS Viya CLI and a few commands you may find useful in this video by SAS’ Darrell Barton.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 6 replies
  • 1375 views
  • 0 likes
  • 4 in conversation