BookmarkSubscribeRSS Feed
Go
Quartz | Level 8 Go
Quartz | Level 8

Hi All,

 

You think I can schedule a simple unix script to send me an alert when LASR server crashes ??

12 REPLIES 12
JuanS_OCS
Amethyst | Level 16

Hello @Go,

 

excellent question. The quick answer is yes, of course, you can do almost anything you need!

The detailed answer will depend on the definition of "crash" for this particular case. With this I mean:

 

- If you want to monitor that the LASR server is providing service, you would need to monitor the TCP port where LASR is listening and/or the system process that started the sas process (LASR) with the command line as expected.

 

- If you want to monitor the LASR tables that are loaded, you can create a script of your own containing code as explained in this link http://support.sas.com/documentation/cdl/en/inmsref/67629/HTML/default/viewer.htm#p050lknh5xepngn1s2... and then you can customize it to monitor your requirements, and check your important tables if they are loaded and how.

 

If there is something else you will need to monitor, please let us know.

 

Hope it helps.

Kind regards,

Juan

swetawasthisas
Obsidian | Level 7

Hi Juan,

 

- If you want to monitor that the LASR server is providing service, you would need to monitor the TCP port where LASR is listening and/or the system process that started the sas process (LASR) with the command line as expected.

 

Kindly let me know if you have any sample script of monitoring and listening the LASR port for the same being down.

 

Regards,

Sweta

JuanS_OCS
Amethyst | Level 16

Hi @swetawasthisas ,

 

I am sorry, but I won't. Not because I don't want to or laziness 😉 but because you are able to find and filter your best answer just by googling something like "script monitor TCP port", or you could use SAS Environment Manager or your favorite monitoring tool to monitor a specific port number and listen when it is up or down.

 

In addition to this, you can:

 

- Use this to read the LASR JVM  http://support.sas.com/kb/58/922.html

- Or write a SAS program to see memory usage of LASR http://support.sas.com/documentation/cdl/en/inmsref/67629/HTML/default/viewer.htm#p050lknh5xepngn1s2...

swetawasthisas
Obsidian | Level 7
Thanks Juan for your input. I will go through the steps you suggested.
alexal
SAS Employee

@Go,

 

From my point of view, better to find out why your LASR server is crashing. The LASR server might crash in a few circumstances:

 

  • A human error, for an example someone killed the process
  • An internal exception within LASR servers
  • Third-party application killing the LASR server, such a Linux Out-Of-Memory killer

First off, this is distributed or non-distributed LASR? Depends on the answer to this question, I will provide you a different set of instructions.

 

@JuanS_OCS,

If you want to monitor that the LASR server is providing service, you would need to monitor the TCP

That's not enough because the TCP port might be in the listening state, but the LASR server unresponsive. The better way to verify if the LASR server is responding is to try to assign SASIOLA library.

Go
Quartz | Level 8 Go
Quartz | Level 8

@JuanS_OCS and @alexal ..

 

Thanks for the detailed reply, ours is distributed environment...! we are still trying to find out the cause/solution ...

alexal
SAS Employee

@Go,

 

Check that pstack is installed, by running 'pstack' at the command line.  If it is, create a file called pstack.sh, which contains:
 

#! /bin/sh
 
pstack $2 > /tmp/sastb.out

 
Make the script executable.
 
Set variables like the following in the /TKGrid/tkmpirsh.sh file on all TKGrid nodes:
 

export TKMPI_DEBUGGER=/opt/sas/pstack.sh
export TKMPI_DEBUGONEXCEPTION=1

 
Restart the LASR server. After the LASR crash, check /tmp on all the machines. One or more may have a traceback in /tmp/sastb.out (only if the LASR server throws an exception).  Send this file wherever you find it.

Go
Quartz | Level 8 Go
Quartz | Level 8

Hi alexal,

 

Please see attached files...!

alexal
SAS Employee

@Go,

 

It appears that you have a network scanner that is running in aggressive mode. There are two ways to avoid the problem you are experiencing:

 

  • Do not run a network scanner on the LASR server
  • Start LASR server using specific TCP port range and exclude that range from scanning

P.S.: Why you didn't tell me that you have a track open? My co-worker and I just did a double amount of work 🙂

Go
Quartz | Level 8 Go
Quartz | Level 8

@alexal,

 

I am sorry  thats my colleague and im his brand new team mate... didnt realize you work for SAS directly, so wanted see if sas community here has other ideas... thanks for your support... we will go with sas track from here.... 

swetawasthisas
Obsidian | Level 7

Hi Alexal,

 

If you want to monitor that the LASR server is providing service, you would need to monitor the TCP-

 

Sweta: I am now searching ways to monitor port and recieve alert when the same down.

 

That's not enough because the TCP port might be in the listening state, but the LASR server unresponsive. The better way to verify if the LASR server is responding is to try to assign SASIOLA library

 

Sweta: Can we have the libname like this for monitoring and have alerts on email if it fails due to some reason. 

alexal
SAS Employee

@swetawasthisas ,

 

Here is an example, one of the customers already implemented that:

 

1. Create a SAS program

LIBNAME TEST SASIOLA  TAG=TEST  PORT=1001 SIGNER="https://MIDDLE_TIER_URL/SASLASRAuthorization"  HOST="example.sas.com" ;
LIBNAME TEST clear;

2. Create a Bash script

#!/bin/sh 

timestmp=`date +'%d%m%Y_%H%M%S'`
LogFile=check_lasr-$timestmp.log
HeadNode="example.sas.com"
sts=""

PROG=${0#*/}
AFIF=/tmp/afif.${PROG}.txt

if [ ! -f "${AFIF}" ]
then	
	/sas/home/SASFoundation/9.4/sas -sysin checklasr.sas -log logs/$LogFile &	
	BGPID=$!	
	sleep 30
	#### Checking for Successful libname assignment to LASR
	grep "Libref LASR was successfully assigned" logs/$LogFile > /dev/null 2>&1
	sts=$?	
	if [[ "$sts" != 0 ]]
	  then	# START: KILL BACKGROUND JOB AND TOUCH AFIF 
		kill -9 ${BGPID}
		echo "Please ensure that you have removed file [${AFIF}] after you have restarted the LASR server" >${AFIF}
	  	# STOP : KILL BACKGROUND JOB AND TOUCH AFIF
		echo Unable to set LIBNAME to LASR.		
		## Email logs to SAS Admin.
		mailx -s "ALERT: LASR Unreachable on [SASApp]" -a $AFIF -r sas-administrator-example@`hostname -f` sas-administrator-example@sas.com < /dev/null
		exit
  	  else
		echo 	LIBNAME to LASR successful. Exiting..
		# REMOVE LOGFILE SINCE THE COMMAND WAS SUCCESSFUL
		rm -f logs/$LogFile
		exit
	fi
else
	echo ACTIVE FAILURE INDICATOR FILE exists.
	echo Please remove file [$AFIF] to resume running this script.
fi

This is just an example provided as is and without any warranties. You have to change libname for your LASR server and adjust the first "grep" command inside a bash script.

suga badge.PNGThe SAS Users Group for Administrators (SUGA) is open to all SAS administrators and architects who install, update, manage or maintain a SAS deployment. 

Join SUGA 

Get Started with SAS Information Catalog in SAS Viya

SAS technical trainer Erin Winters shows you how to explore assets, create new data discovery agents, schedule data discovery agents, and much more.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 12 replies
  • 5290 views
  • 8 likes
  • 4 in conversation