BookmarkSubscribeRSS Feed
MarkESmith
Obsidian | Level 7

Hi all,

 

Recently, our LASR server was down unexpectedly. I was able to start it without issue and load tables into it. Unfortunately, I didn't think to look at the 'Last Action Log' before I manually started the LASR server.

 

Is there any log that would possibly indicated why the LASR server went down? If so, where would it be located?

 

(We are running SAS on Linux)

 

Thanks!

 

21 REPLIES 21
alexal
SAS Employee

@MarkESmith ,


Is the LASR server running in distributed or non-distributed mode?

MarkESmith
Obsidian | Level 7

@alexal,

 

Non-distributed mode

alexal
SAS Employee

@MarkESmith ,


How did you start the LASR server? I guess from the VA Administration Console? Did you restart the object spawner before the LASR crash? If not, have you had a high memory utilization on a compute tier that day?

MarkESmith
Obsidian | Level 7

Yes, I started it in the VA Administration console. I did not restart anything before before the crash and I'm the only one who would, or could, do something like that. I have no hard data to back this up, but I see no reason why memory usage would've been greater than normal yesterday when it happened.

 

Are there no logs that would give some sort of positive indication as to what happened in a case like this?

alexal
SAS Employee

@MarkESmith ,

 

In your case /var/log/messages, but only if the Linux Kernel have killed the LASR server process. Also, I'm wondering, have you had any errors in the object spawner log yesterday?

MarkESmith
Obsidian | Level 7

@alexal,

 

Thanks for your input! I sifted through /var/log/messages and found the following curiosity:

 

Jul 10 00:11:13 kernel: java invoked oom-killer: gfp_mask=0x201da, order=0, oom_score_adj=0
Jul 10 00:11:13 kernel: [<ffffffff81188ab6>] out_of_memory+0x4b6/0x4f0
Jul 10 00:11:13 kernel: Out of memory: Kill process 12411 (sas) score 889 or sacrifice child
Jul 10 00:11:13 kernel: java invoked oom-killer: gfp_mask=0x201da, order=0, oom_score_adj=0
Jul 10 00:11:13 kernel: [<ffffffff81188ab6>] out_of_memory+0x4b6/0x4f0
Jul 10 00:11:13 kernel: Out of memory: Kill process 12411 (sas) score 889 or sacrifice child

 

Looks like this could've been possibly caused by an out-of-memory killer. I grep'd the other archived 'messages' log files and no where else has this occurred.If the kernel killed a child process of java, then that definitely could've made the LASR server unstable and thus caused a crash. Doesn't seem like there's any viable way to determine exactly which process the PID in question belonged to (other than the fact that it was a 'sas' process).

 

I haven't looked through the objectSpawner log yet, but I will.

 

Edit: I have a somewhat shallow understanding of how out-of-memory kills work, but I believe that the process that invokes it is the parent process that is hogging up memory (in part due to child processes). If this is true, that would mean that Java was the memory hog in my case. 

SASKiwi
PROC Star

How often do you reboot your SAS VA servers? We have ours on a monthly reboot schedule and that has certainly helped maintain good reliability and performance. If there are any orphan processes chewing up resources then regular reboots will fix these.

MarkESmith
Obsidian | Level 7

As of now, we've only been rebooting them when we've experienced problems or have had to restart the physical server. Did you experience problems like this before rebooting the VA servers monthly?

Is it possible that a user-initiated process, like a query, could turn rogue and end up eating up all the memory?

PaulS_
Fluorite | Level 6

Over the years, I've found the some SAS processes exhibit signs of memory leak or other "stability" issues, so restarting them periodically seems to be A Good Thing (tm).  So every two weeks we stop all SAS processes, then start them all (in the proper order).  However, taking advantage of the restart, while all processes are stopped we perform a "cold" backup of all relevant config, data, etc.  One can never have too many backups.

MarkESmith
Obsidian | Level 7

I might have to take your recommendation. This problem just occurred again, and this time I'm going to reboot the whole machine.

 

After looking at /var/log/messages - it definitely appears to be some sort of memory leak because I just received another 'out-of-memory kill process' error.

SASKiwi
PROC Star

@MarkESmith - If you monitor SAS VA web server memory usage in SAS Environment Manager you will see it gradually increase over time especially spread over several weeks. A regular reboot will drop that back to a starting minimum. It is good server administration as well as we do at the same time OS patches are applied. 

MarkESmith
Obsidian | Level 7

Thanks for the input. I'm hoping that this reboot will mitigate problems for a while and in the meantime, I can work on a scripted way to reboot the machine and start up all the necessary servers.

 

It appears that the 'LASR Analytic Server' is automatically started either upon reboot or upon execution of the 'sas.servers' script. I assume there must be a way that the 'Public LASR Analytic Server' can be scripted to start as well, correct?

alexal
SAS Employee

@MarkESmith ,

 

Thanks. The LASR server isn't a Java application, but the non-distributed LASR server started from the VA Administration Console will depend on these components:

 

  • Object Spawner
  • Web Server

I'm wondering to see more details about process with ID 12411 if you will be able to find anything in the log files (maybe SAS logs). Anyway, what you've found could potentially kill the LASR server.

MarkESmith
Obsidian | Level 7

Unfortunately, was not able to find anything illuminating about that Process ID in the log files and nothing seemed suspect in the ObjectSpawner logs either. Have you heard of this out-of-memory problem happening on SAS installations before? I'm just wondering if user-initiated requests could snowball into a problem like this (specifically queries). Our machine has PLENTY of memory, so I can't imagine the constraint lies there.

 

The other worry is whether this problem is only a symptom indicative of something worse (that may recur) or if this is more likely a one-time thing that I could prevent (or mitigate) in the future by restarting the VA servers on a monthly basis.

suga badge.PNGThe SAS Users Group for Administrators (SUGA) is open to all SAS administrators and architects who install, update, manage or maintain a SAS deployment. 

Join SUGA 

Get Started with SAS Information Catalog in SAS Viya

SAS technical trainer Erin Winters shows you how to explore assets, create new data discovery agents, schedule data discovery agents, and much more.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 21 replies
  • 3114 views
  • 3 likes
  • 4 in conversation