BookmarkSubscribeRSS Feed
jklaverstijn
Rhodochrosite | Level 12

Hi fellow admins,

 

We are in the process of rolling out a multiple grid environments for a large population of data scientists. We use LSF for the grid management. One of the key components is the grid launched workspace server. Now I am struggling to bring down the time it takes to start a workspace server. The time is now at a minimum of 20 seconds. This is a big stumbling block in the acceptance by the users and I understand why. When using DI Sudio wss's are started all over the place. In EG one experiences an agonizing half minute of hourglass watching. 

 

I have already tweaked a few parameters in lsb.params according to a blog post from Edoardo Riva but I am now out of ideas. That's why I turn to you.

 

Many thanks in advance,

- Jan.

 

lsb.params:

Begin Parameters
MAX_JOB_NUM=10000
NEWJOB_REFRESH=Y
DEFAULT_QUEUE=normal
ABS_RUNLIMIT=Y
MIN_SWITCH_PERIOD=3600
JOB_SCHEDULING_INTERVAL=1
JOB_ACCEPT_INTERVAL=1
JOB_DEP_LAST_SUB=1
ENABLE_EVENT_STREAM=n
MAX_CONCURRENT_QUERY=100
ENABLE_HOST_INTERSECTION=Y
MBD_REFRESH_TIME=10
#MBD_SLEEP_TIME=10
MBD_SLEEP_TIME=1
#SBD_SLEEP_TIME=5
SBD_SLEEP_TIME=1
End Parameters
5 REPLIES 5
PaulHomes
Rhodochrosite | Level 12

Hi Jan,

 

Which versions of SAS & LSF are you using and on which platform? When you look through the logs can you see where most of the delay occurs?

 

Have you seen the following note?: SAS Problem Note 57577: You encounter delays when you start grid-launched workspace servers or when ... Does it apply to your situation?

 

Cheers

Paul

jklaverstijn
Rhodochrosite | Level 12

Hi Paul,

 

This is SAS 9.4M3 and LSF 9.1.3.

 

I have seen the note. It does not apply:

 

Job <1154>, Job Name <SAS Enterprise Guide_SASApp - Workspace Server node 01_F7
                     52E162-0AE0-0345-842F-EA85270DCC20>, User <klavj10>, Proje
                     ct <default>, Command </srv/SASConfig/Lev1/SASApp/Workspac
                     eServer/WorkspaceServer.sh -noterminal -netencryptalgorith
                     m AES -encryptfips -metaserver osasigmdl03.ont.belastingdi
                     enst.nl -metaport 8561 -metarepository Foundation -locale
                     en_US -objectserver -objectserverparms "delayconn sph=osas
                     igndl01.ont.belastingdienst.nl protocol=bridge spawned spp
                     =36720 cid=18 pb classfactory=440196D4-90F0-11D0-9F41-00A0
                     24BB830C server=OMSOBJ:SERVERCOMPONENT/A52BHKER.AY00000Q c
                     el=everything lb grid" -METAUSER '"klavj10@!*(generatedpas
                     sworddomain)*!"' -METAPASS 7720093ab3A185107f65931940859c7
                     1 >
Tue Apr 12 11:30:41: Submitted from host <osasigndl01.ont.belastingdienst.nl>,
                     to Queue <eguide>, CWD <$HOME>, Specified Hosts <osasigcll
                     01.ont.belastingdienst.nl>, <osasigndl01.ont.belastingdien
                     st.nl>;
Tue Apr 12 11:30:41: Dispatched 1 Task(s) on Host(s) <osasigndl01.ont.belasting
                     dienst.nl>, Allocated 1 Slot(s) on Host(s) <osasigndl01.on
                     t.belastingdienst.nl>, Effective RES_REQ <select[type == a
                     ny] order[r15s:pg] >;
Tue Apr 12 11:30:41: Starting (Pid 4890);
Tue Apr 12 11:30:42: Running with execution home </home/ONT/klavj10>, Execution
                      CWD </home/ONT/klavj10>, Execution Pid <4890>;

 This shows what the note calls a "healthy grid" with a one second delay. I will continue investigating log files to sdee where the delay happens. We have an additional app server for SASEM that is not grid launched. There we see apporox. 5 seconds. So that's what we're aiming at. Minus of course some overhead.

 

Cheers Jan.

ISedgwick
SAS Employee

Do you have many pre-assigned libraries on your Grid app server compared to your EM app server that might be causing a delay ?

shayes_ccllc
Fluorite | Level 6

Good idea to check the pre-assigned libraries. It could also be that the number of libraries is reasonable, but one in particular is slow to connect due to network latency or load on the external data source.

 

In addition, check for any customizations to any of the autoexec files that could be doing unnecessary or inefficient work.

nhvdwalt
Barite | Level 11

Have a look at this blog. There is specific mention in how to reduce the LSF sleep times for faster EG start-up times.

 

https://blogs.sas.com/content/sgf/2014/07/16/one-grid-to-rule-them-all-tuning-your-environment-for-s...

 

suga badge.PNGThe SAS Users Group for Administrators (SUGA) is open to all SAS administrators and architects who install, update, manage or maintain a SAS deployment. 

Join SUGA 

Get Started with SAS Information Catalog in SAS Viya

SAS technical trainer Erin Winters shows you how to explore assets, create new data discovery agents, schedule data discovery agents, and much more.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 5 replies
  • 2283 views
  • 5 likes
  • 5 in conversation