BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
draghun9
Fluorite | Level 6

Hello,

I need a suggestion on handling the orphaned SAS processes. Here is the situation:

Once in a while, we see users processes are too slow or the WRS/Portal reports dont open up and keep processing or refresh data throws time-out errors.

And everytime this happens, what we do is stop sas services, kill hanging sas processes but then, it creates <defunct> processes, so finally have to request to reboot linux server, which requires a lot of approvals.

So, now we would like to find a way to prevent these defunct processes hanging in there. FYI, there are both workspace server and pooled workspace server related processes hanging in there.

 

Please advise, how can we prevent these defunct processes being created.

 

Thanks,

Raghu

1 ACCEPTED SOLUTION

Accepted Solutions
Kurt_Bremser
Super User

If you have already killed everything including the spawner, then the parent process of those <defunct> processes should be 1 (the init process). Usually, a

kill -1 1

issued by root should cause it to read all termination messages from the queue, enabling the kernel to clear the process list. If that does not work, there's a problem with the system (we had a similar issue with AIX 4.3 some 15 years ago, but it's fixed for a long time).

View solution in original post

8 REPLIES 8
Kurt_Bremser
Super User

<defunct> means that the process is already gone, but the system has to keep its information in the process list because the parent process has not yet accepted the termination message (it has not called that specific system call yet).

With workspace servers, this means that the spawner is not acknowledging the termination messages, so you should go look there; it might be that your real problem is a stuck object spawner.

draghun9
Fluorite | Level 6

This time when issue happened, Enterprise guide was running fine so, i assume work space server and ObjectSpawner are fine (Please correct me if i am wrong). 

We had issue only regd web reports access and also, most of the hanging jobs i killed were related to pooled work space server running with sassrv id.

I feel, due to network/ Database performance issues, the WRS reports sometime take long time to open and users are just abruptly closing the sessions or cancelling the process, which are reason for these hanging processes. Is there a way to handle these sessions?

Kurt_Bremser
Super User

@draghun9 wrote:

This time when issue happened, Enterprise guide was running fine so, i assume work space server and ObjectSpawner are fine (Please correct me if i am wrong). 

We had issue only regd web reports access and also, most of the hanging jobs i killed were related to pooled work space server running with sassrv id.

I feel, due to network/ Database performance issues, the WRS reports sometime take long time to open and users are just abruptly closing the sessions or cancelling the process, which are reason for these hanging processes. Is there a way to handle these sessions?


Consider a regular restart of the Web Application Server. That is by nature (see my previous post regarding Java) the weakest link in the chain. Our SAS Stored Process Servers are up for months now, one of them since the start of the Object Spawner (April 30).

SASKiwi
PROC Star

@draghun9  - We ensure that all of our SAS servers are on a monthly maintenance reboot cycle that includes OS patching. This removes any orphan processes regardless of how they started. In my experience, monthly reboots are common practice for IT support so this is one approach you should consider.

Kurt_Bremser
Super User

It seems that my systems guy hasn't found anything worth a reboot for some time:

(root)/:> uptime
  07:46AM   up 921 days,  14:28,  6 users,  load average: 3.53, 3.51, 3.37

😉

Given the fact that AIX has a highly modular build (the boot kernel is just 35 MB!) and modules can be unloaded/loaded dynamically, it rarely needs a complete reboot for updates.

Even our Object Spawner is up since April 30.

But we do a restart of the tomcat (Web Application Server) daily. Java runtimes have a nasty habit of overindulging on RAM and going catatonic.

draghun9
Fluorite | Level 6

weird but our system admin is not ready to reboot Smiley Surprised and this is also happening on dev/uat server as well, for which they may not support an idea of monthly reboot at our site.

Also, i was eager to know, if SAS had a way to handle these zombie process (it becomes defunct only after i stop sas services and kill all those hanging process)

Kurt_Bremser
Super User

If you have already killed everything including the spawner, then the parent process of those <defunct> processes should be 1 (the init process). Usually, a

kill -1 1

issued by root should cause it to read all termination messages from the queue, enabling the kernel to clear the process list. If that does not work, there's a problem with the system (we had a similar issue with AIX 4.3 some 15 years ago, but it's fixed for a long time).

draghun9
Fluorite | Level 6

Noted on this. yeah, the pid was 1. I will ask my system admin to perform the kill -1 1 next time and see if that helps. 

Thanks for your response!!

suga badge.PNGThe SAS Users Group for Administrators (SUGA) is open to all SAS administrators and architects who install, update, manage or maintain a SAS deployment. 

Join SUGA 

Get Started with SAS Information Catalog in SAS Viya

SAS technical trainer Erin Winters shows you how to explore assets, create new data discovery agents, schedule data discovery agents, and much more.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 8 replies
  • 2392 views
  • 3 likes
  • 3 in conversation