Thanks for the summon @JuanS_OCS 🙂
ESM is useful for this kind of thing, but running out of allocated open file pointers can be a tough one to figure out - especially as the kernel doesn't really keep track of it in any useful way.
I've got a very minimal script I use sometimes to diagnose this stuff. You can do this:
1. Save the following script as a file on your server - maybe under your sas user's home directory. Call it something like handleMonitor.sh
#!/bin/bash while true; do echo "$(date -u) =========================" >> $1 lsof $3 | tr -s ' ' | cut -d " " -f1-3 | sort | uniq -c | sort -rn >> $1 echo $time $count >> $1 sleep $2 done
2. Make it an executable:
$[nik@edge ~]$ chmod +x handleMonitor.sh
3. Let it run throughout the day, like this:
$[nik@edge ~]$ nohup ./handleMonitor.sh myOpenFileLog.log 300 &
In this example, myOpenFileLog.log is your target logfile (make sure it's in a location that your executing user is able to write to, if you're in your user's home dir that should be fine) and 300 is your logging interval (every 300 secs / 5 mins).
4. Have a look at the myOpenFileLog.log, you'll get output like this, repeating at the interval set above
Mon Nov 12 13:14:14 UTC 2018 ========================= 41 postgres 22012 esmuser 40 postgres 22018 esmuser 40 postgres 22017 esmuser 40 postgres 22015 esmuser 39 postgres 22016 esmuser 36 postgres 22019 esmuser 36 postgres 22013 esmuser 24 tmux 5718 esmuser 19 tmux 13376 esmuser 17 lsof 17544 esmuser 14 bash 5740 nik 14 bash 16866 esmuser 14 bash 13255 nik 14 bash 13230 esmuser 13 ta 13370 esmuser 13 handleMon 17522 esmuser 12 lsof 17550 esmuser 11 sort 17549 esmuser 11 sort 17547 esmuser 10 cut 17546 esmuser
This will tell you the count of file handles open per process in the first column, and then give you the process's command, pid and user in the other two. If there's a rogue pid somewhere eating all your file handles, it should be obvious here as a standalone pid with a high number in the first column. Otherwise if there's simply too many concurrent processes running under the same user and hitting their limit, that should also show up as multiple processes with a moderate count. It is an extension to what @nhvdwalt suggests - I think he may be on the right track.
I hope this helps you with your immediate problem. It does feel like you are running into challenges around managing the user generated workload on your SAS environment, especially if the number of EG users on your environment is growing. This is where ESM, the product Juan describes in his post, can help a lot. If you're interested in seeing how, please drop me a line and we can arrange a demo.
Nik
... View more