We have been running SAS VA 7.2 non-distributed on a Windows 2012 R2 server for several months now. Suddenly, all of the web applications (VA, Web Administration, Environment Manager, SAS Studio) running on this box now get "Service Temporarily Unavailable". Rebooting the box grants about 10 minutes of uptime, then it goes down again.
The windows server itself shows all expected SAS services running, and Management Console connects just fine and shows all configuration settings. The issue seems to be focused to just the web server (or web app server).
I have a support ticket submitted, but now we've been down for days and I'm recruiting all the help I can get.
What's in the LevN/Web/WebAppServer/SASServer12_1/logs/server.log (possibly SASServer1 or SASServer if you're single managed server)? If you're seeing that error message it means the WebServer is okay, but the web app server is not - the reverse proxy can't communicate with the backend server - typically that means additional information is available either in a particular application log in LevN/Web/Logs/SASServerN/ or the aforementioned server.log. Likely culprit is the JVM exiting for some reason or thrashing in garbage collection to the point it becomes unresponsive.
Most recent error message, written shortly after a reboot: (internal hostname obfuscated by me)
2016-07-06 11:19:45,694 ERROR (localhost-startStop-6) [org.apache.catalina.core.ContainerBase.[Catalina].[localhost].[/SASDeploymentBackup]] Exception sending context initialized event to listener instance of class org.springframework.web.context.ContextLoaderListener
org.springframework.beans.factory.BeanCreationException: Error creating bean with name 'identityCache': Invocation of init method failed; nested exception is com.gemstone.gemfire.distributed.DistributedSystemDisconnectedException: GemFire on [[INTERNAL HOSTNAME]]<v482>:30329/59464 started at Wed Jul 06 11:11:20 EDT 2016: Message distribution has terminated
[[followed by java stack trace]]
Also, SAS support helped us enable GC logging, and I see this behavior [abridged for clarity]:
{Heap before GC invocations=34 (full 0):
par new generation total 419392K, used 404067K
eden space 372800K, 100% used
from space 46592K, 67% used
to space 46592K, 0% used
2016-07-06T12:36:51 [Times: user=0.05 sys=0.02, real=0.02 secs]
Heap after GC invocations=35 (full 0):
par new generation total 419392K, used 26187K
eden space 372800K, 0% used
from space 46592K, 56% used
to space 46592K, 0% used
372 of our 512 GB memory goes from 0 to 100% in two seconds? This goes back and forth over and over several times, in very short intervals. I'm not fluent in log files, but this seems not good.
Sounds like support has you on the right track, none of that seems especially bad though (your eden space isn't the entire heap, see
http://www.oracle.com/technetwork/tutorials/tutorials-1876574.html#t2 for a decent summary)
The SAS Users Group for Administrators (SUGA) is open to all SAS administrators and architects who install, update, manage or maintain a SAS deployment.
SAS technical trainer Erin Winters shows you how to explore assets, create new data discovery agents, schedule data discovery agents, and much more.
Find more tutorials on the SAS Users YouTube channel.