BookmarkSubscribeRSS Feed
ErikLund_Jensen
Rhodochrosite | Level 12

Hi everyone

 

Unlike SAS EG or Display Manager, which both run in a single workspace server process, most actions performed in DI Studio trigger a separate process. In a traditional setup this doesn’t cause a noticeable delay, but a SAS GRID setup is designed for efficient execution of long-running jobs, not for fast start-up of new processes. The Grid Manager uses anything from 10 seconds depending on the actual configuration, before a process begins execution, no matter how long the process runs before completion.

 

This latency time before anything happens is very frustrating for DI Studio users. If record count is enabled in DI Studio, which in most cases is very convenient, just clicking on a table to see its properties causes this delay, amd much of the time used in DI Studio is spent waiting for the Grid manager to wake up, select a server and initiate the process. In theory, there is a similar latency problem in batch execution, where most of the several thousand jobs running in the daily batch are completed in a few seconds or less, but we have so much excess capacity that it’s not a problem.

 

According to our SAS Consultants everything has been done to reduce the problem by configuring grid queues, but there seems to be an inherent mismatch between the working principles of DI Studio and SAS Grid, and I write this hoping for a good idea, as we can’t be the only installation with this problem. I have thought of circumventing the Grid Manager and reserve one server in daytime to serve DI Studio users using a specially configured workspace server, but I don’t know if it’s a good idea or even feasible.

 

I am looking forward to answers from others having the same problem, and I hope they will share their experiences and solutions.

5 REPLIES 5
JuanS_OCS
Amethyst | Level 16

Hi @ErikLund_Jensen ,

 

great topic, I am working with this topic for  a while and I would  like to drop a couple  of further questions, what I know,  and I would  love to hear what you and others have to say in this regard.

 

First of  all, a few questions for you and anyone who answers here:

  • what version and maintenance level of SAS do you have? 
  • What SAS Grid Manager (Platform/LSF provider, Hadoop  or SAS provider/M6)?
  • what  OS and version of OS?
  • How is the authentication being done, at OS  level, SAS level, and Grid level?

Now, my personal answers in my current Grid:

  • what version and maintenance level of SAS do you have? - 9.4 M6
  • What SAS Grid Manager (LSF provider, Hadoop  or SAS provider/M6)? SAS Provider / M6
  • what  OS and version of OS? RHEL 7.6
  • How is the authentication being done, at OS  level, SAS level, and Grid level? Host authentication with PAM/SSSD to AD at every level. No Kerberos (except in SSSD itself)

 

Now, here is what I know:

 

  • Display Manager is not much affected by this, but SAS EG and SAS Studio can be equally affected, if you will use several connections, connecting and disconnecting, per user.
  • When this happens, it is  important and possible to track the route cause: errors, network, authentication, disk, etc. For this, there are several ways to increase information: DEGUG level in logs, include additional components too Log4J  (in debug mode), analyse timing with sh -x, etc
  • Normally, we would have Load-Balanced and Grid-Launched Workspace servers (and/or STP,  PWKS, etc), then those initial connections or spawned sessions/jobs have this "latency".
    • Try this. If you try to disable the Grid-Launched part for a little bit in a Workspace Server, I think you might experience less "latency", quicker connections and spawned sessions/jobs.
  • One very frequent reason for this "latency" can be authentication, specially when it is an AD or LDAP user.
    • Imagine you have authentication that it is a bit slower than usual, but still seen in ms. SAS Grid does several  authentications at several  levels (ObjectSpawner, Metadata, Workspace Server,  Grid, etc) and in several  nodes to be able to distribute the workload.
    • Having this case, imagine how this "slight" slow authentication (ms) can be increased  to several seconds, per spawned session/job, when we are in the grid, because several authentications must happen.
    • This AD/LDAP slowness commonly happens due to network, AD/LDAP cluster configuration or because every user has a very large # of groups, hence, queries to  AD/LDAP take longer.
  • Additionally, there are a few settings available you can play with. When I was in M2 (Platfform/LSF provider) there was something in RTM (now Environment Manager) that can lower the waiting times,  hence,  lower that "latency" , but I cannot recall  or find the setting since I don't have access to a Grid environment with Platform provider, at the moment.
  • For M6 with SAS Provider, SAS has just released (thanks to Tech Support and R&D, great job), a couple of hotfixes to  help with this: E3Y005 and E3Q013

 

Does anything of this help? Looking forward for thoughts!

 

Best regards, Juan

ErikLund_Jensen
Rhodochrosite | Level 12

Hi @JuanS_OCS 

 

Thanks for your answer. Our current SAS setup is DI Studio is 4.903 and (serverside) RHEL 7.4 / SAS V940m5a / IBM Spectrum LSF 10.1.0. We have data and SAS work / utilloc areas shared between 5 physical grid servers on a very fast solid state file system.

 

So you think the latency may be due to the authentication process. We never thought of that, but have so far only had the grid manager's handling of initial connections and spawned sessions in focus. Unfortunately authentication is not my core competence, but the twin-headed monster Cerberus is part of it. I will give this idea to our very competent SAS technical representative. He happens to be on the premises tomorrow, and I will let you know what happens further.

 

Best regards

Erik

JuanS_OCS
Amethyst | Level 16

Hi @ErikLund_Jensen ,

 

a little follow up: have you got any progress on this topic?

 

Regards,

Juan

ErikLund_Jensen
Rhodochrosite | Level 12

Hi @JuanS_OCS 

 

Thanks for your continuing interest in this matter.

 

Our SAS technical consultant can't see how a 5-10 second delay could be caused by the authentication process. His next move will be to  set up an extra server context bypassing the grid, so we can test if that makes any difference in how long it takes to establish a connection from a client Display Manager session. 

 

I hope we can squeeze that in before an upgrade to M6. and I will let you know what happens.

 

Best redards

Erik

 

JuanS_OCS
Amethyst | Level 16

Hi @ErikLund_Jensen ,

 

good luck with that, then, I hope you can get sorted it out for that moment, indeed.

And, yes, please share the progress, I am interested.

 

Best regards,

Juan

suga badge.PNGThe SAS Users Group for Administrators (SUGA) is open to all SAS administrators and architects who install, update, manage or maintain a SAS deployment. 

Join SUGA 

Get Started with SAS Information Catalog in SAS Viya

SAS technical trainer Erin Winters shows you how to explore assets, create new data discovery agents, schedule data discovery agents, and much more.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 5 replies
  • 917 views
  • 6 likes
  • 2 in conversation