Solved: 3rd Party Monitoring

five · Posted 11-29-2021 02:01 PM

Our NOC uses various ITSM tools, including Solarwinds and OpenDuty to send alerts out. I asked support if there was a best practice way to monitor any of the SAS services and applications that we run and the reply was "we have no recommendations regarding 3rd party tools." Currently, we are monitoring at the OS level. But we are currently fighting some problems with our disk filling up very rapidly. It has happened 3 times so far. The first time, generic curl commands would probably have worked, because apache did not load at all and was returning a 404 error. But the other time the ansible status command has mainly been up for the services and instead returns a 500 internal server error. Now that I know about healthcheck, I will run it the next time this happens.

I would really like to get alerted if that happens again. Is there a way for my NOC department to use solarwinds to monitor specific Viya applications? I have read about the sas-admin healthcheck and have thought about using it to create some breadcrumb-type files that NOC could poll. Is there a more direct or better way to do this type of monitoring? I have attached a healthcheck that I ran a few minutes ago that includes the various services and applications that we have in our environment.

If it's helpful to know, we have a 4 server environment:

web (web, rabbitmq, ansible)

app (cas)

worker1 (cas worker)

worker2 (cas worker)

gwootton · Posted 11-29-2021 03:52 PM

I use the /<service>/apiMeta endpoint which will return a 200 for everything but /SASLogon and /cas-shared-default-http, then I just use the endpoint itself:

if [ "$ep" == "/cas-shared-default-http" ] || [ "$ep" == "/SASLogon" ]
then resp=$(curl --location -o /dev/null -s http://localhost$ep/ -w "%{http_code}")
else resp=$(curl --location -o /dev/null -s http://localhost$ep/apiMeta -w "%{http_code}")
fi

--
Greg Wootton | Principal Systems Technical Support Engineer

View solution in original post

gwootton · Posted 11-29-2021 03:36 PM

I had a similar issue with file system consumption as a result of forgetting to turn off debug level logging or running a program that built data sets too large for the server. I addressed this by:
1. Writing a script that would check the file system usage against a predefined threshold and send me an email if it was beyond that threshold.
2. Wrote a script to check each service endpoint defined in /etc/httpd/conf.d/proxy.conf to confirm I could get a 200 response via curl, and build and email a report of any that did not respond with a 200 (healthcheck does similar checks).
3. Wrote a script to check for any log levels set to trace or debug and send an email if any were.

I scheduled these scripts to run periodically throughout the day.

Finally:
4. Set up disk quotas so users could not consume all the disk space.

--
Greg Wootton | Principal Systems Technical Support Engineer

five · Posted 11-29-2021 03:49 PM

I get a 401 unauthorized when using curl, even if I use the one in proxy.conf. How do you get around having to login, in order to pull the 200 status?

I am already doing temporary logging to determine what is filling up the drive, I don't know why I didn't think about having it send me an email as well. So that is a short time fix. But I'd like to get the rest setup properly as well.

gwootton · Posted 11-29-2021 03:52 PM

I use the /<service>/apiMeta endpoint which will return a 200 for everything but /SASLogon and /cas-shared-default-http, then I just use the endpoint itself:

if [ "$ep" == "/cas-shared-default-http" ] || [ "$ep" == "/SASLogon" ]
then resp=$(curl --location -o /dev/null -s http://localhost$ep/ -w "%{http_code}")
else resp=$(curl --location -o /dev/null -s http://localhost$ep/apiMeta -w "%{http_code}")
fi

--
Greg Wootton | Principal Systems Technical Support Engineer

five · Posted 11-29-2021 04:00 PM

That worked a treat. Thank you.

five · Posted 11-30-2021 10:14 AM

I was comparing proxy.conf against the healthcheck and there are a handful of Infrastructure Applications that were not in the proxy list:

cachelocator-listener-v1 (there is also cachelocator and cacheserver, so they might be redundant?)
cas-shared-default (cas-shared-default-http was in proxy.conf, but not this one)

SAS Infrastructure Data Server
SAS Message Broker (I believe this is sasrabbitmq, so it seems like this one would be super critical to monitor.)

Any ideas for monitoring these? Everything else was in both places.

gwootton · Posted 11-30-2021 10:17 AM

The proxy.conf is for services that are accessible via the web server, so anything not in there would not be accessible using curl. The health check does check these so you could use that, though when those things are down I suspect those web services would start failing. You could check their service ports are accessible using netcat.

--
Greg Wootton | Principal Systems Technical Support Engineer

3rd Party Monitoring

Re: 3rd Party Monitoring

Re: 3rd Party Monitoring

Re: 3rd Party Monitoring

Re: 3rd Party Monitoring

Re: 3rd Party Monitoring

Re: 3rd Party Monitoring

Re: 3rd Party Monitoring