BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
five
Obsidian | Level 7

Our NOC uses various ITSM tools, including Solarwinds and OpenDuty to send alerts out. I asked support if there was a best practice way to monitor any of the SAS services and applications that we run and the reply was "we have no recommendations regarding 3rd party tools." Currently, we are monitoring at the OS level. But we are currently fighting some problems with our disk filling up very rapidly. It has happened 3 times so far. The first time, generic curl commands would probably have worked, because apache did not load at all and was returning a 404 error. But the other time the ansible status command has mainly been up for the services and instead returns a 500 internal server error. Now that I know about healthcheck, I will run it the next time this happens.

 

I would really like to get alerted if that happens again. Is there a way for my NOC department to use solarwinds to monitor specific Viya applications? I have read about the sas-admin healthcheck and have thought about using it to create some breadcrumb-type files that NOC could poll. Is there a more direct or better way to do this type of monitoring? I have attached a healthcheck that I ran a few minutes ago that includes the various services and applications that we have in our environment.

 

If it's helpful to know, we have a 4 server environment:

web (web, rabbitmq, ansible)

app (cas)

worker1 (cas worker)

worker2 (cas worker)

1 ACCEPTED SOLUTION

Accepted Solutions
gwootton
SAS Super FREQ
I use the /<service>/apiMeta endpoint which will return a 200 for everything but /SASLogon and /cas-shared-default-http, then I just use the endpoint itself:

if [ "$ep" == "/cas-shared-default-http" ] || [ "$ep" == "/SASLogon" ]
then resp=$(curl --location -o /dev/null -s http://localhost$ep/ -w "%{http_code}")
else resp=$(curl --location -o /dev/null -s http://localhost$ep/apiMeta -w "%{http_code}")
fi
--
Greg Wootton | Principal Systems Technical Support Engineer

View solution in original post

6 REPLIES 6
gwootton
SAS Super FREQ
I had a similar issue with file system consumption as a result of forgetting to turn off debug level logging or running a program that built data sets too large for the server. I addressed this by:
1. Writing a script that would check the file system usage against a predefined threshold and send me an email if it was beyond that threshold.
2. Wrote a script to check each service endpoint defined in /etc/httpd/conf.d/proxy.conf to confirm I could get a 200 response via curl, and build and email a report of any that did not respond with a 200 (healthcheck does similar checks).
3. Wrote a script to check for any log levels set to trace or debug and send an email if any were.

I scheduled these scripts to run periodically throughout the day.

Finally:
4. Set up disk quotas so users could not consume all the disk space.
--
Greg Wootton | Principal Systems Technical Support Engineer
five
Obsidian | Level 7

I get a 401 unauthorized when using curl, even if I use the one in proxy.conf. How do you get around having to login, in order to pull the 200 status?

 

I am already doing temporary logging to determine what is filling up the drive, I don't know why I didn't think about having it send me an email as well. So that is a short time fix. But I'd like to get the rest setup properly as well.

gwootton
SAS Super FREQ
I use the /<service>/apiMeta endpoint which will return a 200 for everything but /SASLogon and /cas-shared-default-http, then I just use the endpoint itself:

if [ "$ep" == "/cas-shared-default-http" ] || [ "$ep" == "/SASLogon" ]
then resp=$(curl --location -o /dev/null -s http://localhost$ep/ -w "%{http_code}")
else resp=$(curl --location -o /dev/null -s http://localhost$ep/apiMeta -w "%{http_code}")
fi
--
Greg Wootton | Principal Systems Technical Support Engineer
five
Obsidian | Level 7

That worked a treat. Thank you.

five
Obsidian | Level 7

I was comparing proxy.conf against the healthcheck and there are a handful of Infrastructure Applications that were not in the proxy list:

cachelocator-listener-v1 (there is also cachelocator and cacheserver, so they might be redundant?)
cas-shared-default (cas-shared-default-http was in proxy.conf, but not this one)

SAS Infrastructure Data Server
SAS Message Broker (I believe this is sasrabbitmq, so it seems like this one would be super critical to monitor.)

 

Any ideas for monitoring these? Everything else was in both places.

gwootton
SAS Super FREQ
The proxy.conf is for services that are accessible via the web server, so anything not in there would not be accessible using curl. The health check does check these so you could use that, though when those things are down I suspect those web services would start failing. You could check their service ports are accessible using netcat.
--
Greg Wootton | Principal Systems Technical Support Engineer

suga badge.PNGThe SAS Users Group for Administrators (SUGA) is open to all SAS administrators and architects who install, update, manage or maintain a SAS deployment. 

Join SUGA 

Get Started with SAS Information Catalog in SAS Viya

SAS technical trainer Erin Winters shows you how to explore assets, create new data discovery agents, schedule data discovery agents, and much more.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 6 replies
  • 4172 views
  • 1 like
  • 2 in conversation