04-12-2016 05:23 AM
Hi fellow admins,
We are in the process of rolling out a multiple grid environments for a large population of data scientists. We use LSF for the grid management. One of the key components is the grid launched workspace server. Now I am struggling to bring down the time it takes to start a workspace server. The time is now at a minimum of 20 seconds. This is a big stumbling block in the acceptance by the users and I understand why. When using DI Sudio wss's are started all over the place. In EG one experiences an agonizing half minute of hourglass watching.
I have already tweaked a few parameters in lsb.params according to a blog post from Edoardo Riva but I am now out of ideas. That's why I turn to you.
Many thanks in advance,
Begin Parameters MAX_JOB_NUM=10000 NEWJOB_REFRESH=Y DEFAULT_QUEUE=normal ABS_RUNLIMIT=Y MIN_SWITCH_PERIOD=3600 JOB_SCHEDULING_INTERVAL=1 JOB_ACCEPT_INTERVAL=1 JOB_DEP_LAST_SUB=1 ENABLE_EVENT_STREAM=n MAX_CONCURRENT_QUERY=100 ENABLE_HOST_INTERSECTION=Y MBD_REFRESH_TIME=10 #MBD_SLEEP_TIME=10 MBD_SLEEP_TIME=1 #SBD_SLEEP_TIME=5 SBD_SLEEP_TIME=1 End Parameters
04-12-2016 06:01 AM
Which versions of SAS & LSF are you using and on which platform? When you look through the logs can you see where most of the delay occurs?
Have you seen the following note?: SAS Problem Note 57577: You encounter delays when you start grid-launched workspace servers or when ... Does it apply to your situation?
04-12-2016 07:01 AM
This is SAS 9.4M3 and LSF 9.1.3.
I have seen the note. It does not apply:
Job <1154>, Job Name <SAS Enterprise Guide_SASApp - Workspace Server node 01_F7 52E162-0AE0-0345-842F-EA85270DCC20>, User <klavj10>, Proje ct <default>, Command </srv/SASConfig/Lev1/SASApp/Workspac eServer/WorkspaceServer.sh -noterminal -netencryptalgorith m AES -encryptfips -metaserver osasigmdl03.ont.belastingdi enst.nl -metaport 8561 -metarepository Foundation -locale en_US -objectserver -objectserverparms "delayconn sph=osas igndl01.ont.belastingdienst.nl protocol=bridge spawned spp =36720 cid=18 pb classfactory=440196D4-90F0-11D0-9F41-00A0 24BB830C server=OMSOBJ:SERVERCOMPONENT/A52BHKER.AY00000Q c el=everything lb grid" -METAUSER '"klavj10@!*(generatedpas sworddomain)*!"' -METAPASS 7720093ab3A185107f65931940859c7 1 > Tue Apr 12 11:30:41: Submitted from host <osasigndl01.ont.belastingdienst.nl>, to Queue <eguide>, CWD <$HOME>, Specified Hosts <osasigcll 01.ont.belastingdienst.nl>, <osasigndl01.ont.belastingdien st.nl>; Tue Apr 12 11:30:41: Dispatched 1 Task(s) on Host(s) <osasigndl01.ont.belasting dienst.nl>, Allocated 1 Slot(s) on Host(s) <osasigndl01.on t.belastingdienst.nl>, Effective RES_REQ <select[type == a ny] order[r15s:pg] >; Tue Apr 12 11:30:41: Starting (Pid 4890); Tue Apr 12 11:30:42: Running with execution home </home/ONT/klavj10>, Execution CWD </home/ONT/klavj10>, Execution Pid <4890>;
This shows what the note calls a "healthy grid" with a one second delay. I will continue investigating log files to sdee where the delay happens. We have an additional app server for SASEM that is not grid launched. There we see apporox. 5 seconds. So that's what we're aiming at. Minus of course some overhead.
02-09-2018 10:45 AM
Do you have many pre-assigned libraries on your Grid app server compared to your EM app server that might be causing a delay ?
02-09-2018 02:33 PM
Good idea to check the pre-assigned libraries. It could also be that the number of libraries is reasonable, but one in particular is slow to connect due to network latency or load on the external data source.
In addition, check for any customizations to any of the autoexec files that could be doing unnecessary or inefficient work.
02-12-2018 12:50 AM
Have a look at this blog. There is specific mention in how to reduce the LSF sleep times for faster EG start-up times.