BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
miki7
Obsidian | Level 7

Dear SAS Communities,

as a partner of SAS we have client with Visual Analytics distributed solution installed.

 

In this solution we have these servers:

 

1 compute node

2 middle-tiers

1 metadata

 

In last few weeks, we started to have problems with failing gemfire locator, which led to complete fall of web applications where login screen says "blablah 

Please contact your administrator for assistance"

 

The problem is  (as we believe) with communication between middle-tier 1 and middle-tier 2, where all webapps including VA and SASStudio runs.

 

My main question is:

What do you think about increasing the timeout value for gemfire communication between servers? Could be 5 sec too little time? Could be increasing it to like 1 min or more problem?

 

Here's short sample from gemfire.log on one of middtiers (this one from mid_tier02, but mid_tier01 has nearly same log just with swapped server-name values):

 

 

[info 2017/06/28 21:12:10.310 CEST <VERIFY_SUSPECT.TimerThread> tid=0x55] No suspect verification response received from %MID_TIER01%(54776)<v3>:37678 in 5989 milliseconds: I believe it is gone.

[info 2017/06/28 21:12:11.312 CEST <UDP ucast receiver> tid=0x1e] Member %MID_TIER01%(50790)<v1>:50358 is no longer suspect

[info 2017/06/28 21:12:11.312 CEST <UDP ucast receiver> tid=0x1e] failure detection received notification that %MID_TIER01%(50790)<v1>:50358 is no longer suspect

[info 2017/06/28 21:12:12.315 CEST <UDP ucast receiver> tid=0x1e] Member %MID_TIER01%(50790)<v1>:50358 is no longer suspect

[info 2017/06/28 21:12:12.315 CEST <UDP ucast receiver> tid=0x1e] failure detection received notification that %MID_TIER01%(50790)<v1>:50358 is no longer suspect

[info 2017/06/28 21:12:13.317 CEST <UDP ucast receiver> tid=0x1e] Member %MID_TIER01%(54776)<v3>:37678 is no longer suspect

[info 2017/06/28 21:12:13.317 CEST <UDP ucast receiver> tid=0x1e] failure detection received notification that %MID_TIER01%(54776)<v3>:37678 is no longer suspect

[info 2017/06/28 21:12:14.319 CEST <UDP ucast receiver> tid=0x1e] Member %MID_TIER01%(50790)<v1>:50358 is no longer suspect

[info 2017/06/28 21:12:14.319 CEST <UDP ucast receiver> tid=0x1e] failure detection received notification that %MID_TIER01%(50790)<v1>:50358 is no longer suspect

[info 2017/06/28 21:12:15.031 CEST <FD_SOCK Ping thread> tid=0x24] suspecting member %MID_TIER02%(63477)<v6>:40788

[info 2017/06/28 21:12:15.031 CEST <UDP Incoming Message Handler> tid=0x1d] Received Suspect notification for member(s) [%MID_TIER02%(60274)<v5>:21007] from %MID_TIER02%(59925)<v4>:7935.

[info 2017/06/28 21:12:15.180 CEST <ViewHandler> tid=0x44] Membership: sending new view [[%MID_TIER02%(59925)<v4>:7935|19] [%MID_TIER02%(59925)<v4>:7935/56543, %MID_TIER02%(63477)<v6>:40788/56113, %MID_TIER02%(64709)<v7>:55906/49673]] (3 mbrs)


[info 2017/06/28 21:12:15.321 CEST <UDP ucast receiver> tid=0x1e] Member %MID_TIER01%(50790)<v1>:50358 is no longer suspect

[info 2017/06/28 21:12:15.322 CEST <UDP ucast receiver> tid=0x1e] failure detection received notification that %MID_TIER01%(50790)<v1>:50358 is no longer suspect

[info 2017/06/28 21:12:15.322 CEST <UDP ucast receiver> tid=0x1e] Member %MID_TIER01%(54776)<v3>:37678 is no longer suspect

[info 2017/06/28 21:12:15.323 CEST <UDP ucast receiver> tid=0x1e] failure detection received notification that %MID_TIER01%(54776)<v3>:37678 is no longer suspect

[info 2017/06/28 21:12:15.323 CEST <UDP Incoming Message Handler> tid=0x1d] Received Suspect notification for member(s) [%MID_TIER02%(60274)<v5>:21007, %MID_TIER02%(63477)<v6>:40788] from %MID_TIER02%(59925)<v4>:7935.

[info 2017/06/28 21:12:15.324 CEST <UDP Incoming Message Handler> tid=0x1d] Membership: received new view [%MID_TIER02%(59925)<v4>:7935|19] [%MID_TIER02%(59925)<v4>:7935/56543, %MID_TIER02%(63477)<v6>:40788/56113, %MID_TIER02%(64709)<v7>:55906/49673]

 

Thank you very much and have a nice day! 

 

Michal

1 ACCEPTED SOLUTION

Accepted Solutions
alexal
SAS Employee

@miki7,

 

Yes, you can increase the timeout value. Stop all the midtier services, SAS Cache Locator, then edit /<SASConfig>/Lev<X>/Web/gemfire/instances/ins_41415/gemfire-start-locator-sas.sh, add the following JVM parameters to the JAVA_ARGS="" value:

 

-Dgemfire.conserve-sockets=false -Dgemfire.member-timeout=30000

Also, its worth to adjust Dgemfire.member-timeout in /<SASConfig>/Lev<X>/Web/WebAppServer/SASServer1_1/bin/setenv.sh (JVM_OPTS):

 

-Dgemfire.member-timeout=30000

Start all the midtier services after these changes.

View solution in original post

1 REPLY 1
alexal
SAS Employee

@miki7,

 

Yes, you can increase the timeout value. Stop all the midtier services, SAS Cache Locator, then edit /<SASConfig>/Lev<X>/Web/gemfire/instances/ins_41415/gemfire-start-locator-sas.sh, add the following JVM parameters to the JAVA_ARGS="" value:

 

-Dgemfire.conserve-sockets=false -Dgemfire.member-timeout=30000

Also, its worth to adjust Dgemfire.member-timeout in /<SASConfig>/Lev<X>/Web/WebAppServer/SASServer1_1/bin/setenv.sh (JVM_OPTS):

 

-Dgemfire.member-timeout=30000

Start all the midtier services after these changes.

suga badge.PNGThe SAS Users Group for Administrators (SUGA) is open to all SAS administrators and architects who install, update, manage or maintain a SAS deployment. 

Join SUGA 

CLI in SAS Viya

Learn how to install the SAS Viya CLI and a few commands you may find useful in this video by SAS’ Darrell Barton.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 1 reply
  • 6771 views
  • 0 likes
  • 2 in conversation