Administration and Deployment

jklaverstijn · Posted 10-30-2018 10:55 AM

Dear fellow admins,

We are in the process of migrating a number of departments from their own dedicated SAS server to a large common grid based environment. This has resulted in a larger consolidated set of metadata that is now being served by a cluster of 4 metadata server nodes. Everything on the server side is Linux based. These servers have lots of memory and CPU load is never high. The nodes talk to each other over a 10GB connection.

We see an increase in the startup time of management console that seems to be proportional to the volume of metadata but also very dependent upon network conditions. Users that are connected via a VDI (virtual desktop) are the lucky ones: 64 seconds before getting control over the interface. Worst of are the Wifi users (also the majority) with 145 seconds. No wonder people start complaining.

We have done a network trace and see a large amount (130k) of relatively small (avg. 470 bytes) number of tcp packages. So it makes sense that network latency is our worst enemy. Total volume is not much (58MB) but is being delivered in very small chunks.

I was wondering if we are the only ones that see this and, if not, how this can be dealt with. We will shortly contact SAS Support as well but I was wondering what's happening in the field.

So, any experiences you would like to share?

Many thanks in advance,

-- Jan.

nhvdwalt · Posted 10-30-2018 11:01 AM

Hi @jklaverstijn

Do this test....

1.) Pick a regular user and measure the response time as a baseline.. like what you've described

2.) Then, take away all the user's permissions. Easiest is to remove all the user's groups and assuming you are not granting lots of permissions to SASUSER group. Redo the test and see what the difference is.

If the response times are a lot better, it might means that your user has permissions to many metadata objects.

jklaverstijn · Posted 10-30-2018 11:21 AM

Good one @nhvdwalt. We did that and see no reduction in the amount of packages but the average size drops to 200 bytes! And we passed this test for the tightness of our authorisation model 🙂

There is no noticable improvement in sasmc start time. As the number of packages is the same, this confirms that latency is the killer here.

Thanks,

- Jan.

JuanS_OCS · Posted 10-30-2018 11:07 AM

Hello @jklaverstijn, Jan,

I am not really sure if something that I already experienced, it is also what you are experiencing...

Under my experiences, this SAS Note has helped me great deal in the past :

Problem Note 49308: SAS® Management Console is slow to launch the first time http://support.sas.com/kb/49/308.html

I know it is not mentioning anything about networking, but I wonder about those small packages traveling along your net.

Do you have any local installation of SAS Foundation, perhaps on a VDI? TKJNI_OPT_TRACE can help a lot with tracing the java TK components.

Do you experience the same also with a SAS Base metabrowse or a connection to SAS DI?

nhvdwalt · Posted 10-30-2018 11:11 AM

@JuanS_OCS makes a good point.

If you are running UNIX, you could also start SAS Management Console on the UNIX command line through X11 and test the effects of any AV's on Windows.

jklaverstijn · Posted 10-30-2018 11:24 AM

Thanks @JuanS_OCS

Yep, did that and X11 is approximately the same as the VDI. But this is not available to end users, just admins do this.

Thanks,

- Jan.

JuanS_OCS · Posted 10-30-2018 11:26 AM

@nhvdwalt, @jklaverstijn,

I did not mention anything about X11. Nice idea, but not mine, it is @nhvdwalt who introduced it.

Jan, did you have the chance to give read through the SAS Note?

jklaverstijn · Posted 10-30-2018 12:10 PM

Ah yes, getting things mixed up here. Sorry for that 😉

Yes I went through the doc. Most of it was not relevant. But the test with TKJNI_OPT_TRACE looks interesting but also time consuming. I will have to get back to you on that. It's good to know it as it is also an early warning for what SAS Support may start with. Better be prepared.

Thanks,

-- Jan.

Kurt_Bremser · Posted 10-30-2018 12:07 PM

To remove the network from the equation, start SMC via X11 on a metadata server itself.

Maxims of Maximally Efficient SAS Programmers
How to convert datasets to data steps
The macro for direct download as ZIP
How to post code
Please vote for Provide Sequential Search Capability for Hash Objects
How to deal with locked files on UNIX

nhvdwalt · Posted 10-31-2018 01:08 AM

@Kurt_Bremser suggested a good test.

Then, try two tests, one with the server's actual name and another one with localhost.

Have you looked at the actual network path from VDI to the Metadata server ? Some enterprise networks can be really complex.

PaulHomes · Posted 10-31-2018 02:17 AM

Hi Jan,

A few things of the top of my head....

Any there interesting messages in the SAS Management Console client log SASMCErrorLog.txt?

The worst case of slow SAS MC start I've encountered was due to anti-virus/malware scanning. Any chance of temporarily disabling it to see how much impact it is having?

Is it just SAS Management Console? Do any other Java based SAS apps connecting to metadata have the same issue (SAS Data Integration Studio, SAS Information Map Studio etc)?

Is DNS resolution working well (dig, nslookup etc)? I have seen long delays in the past with slow DNS and timeouts.

SAS Management Console also makes a connection to the mid-tier during startup/login - what sort of mid-tier response are you getting from those clients?

Perhaps if you are able to temporarily run the metadata cluster in non-cluster mode you could see if there is any differences there with respect to metadata server cluster node redirections?

Cheers
Paul

jklaverstijn · Posted 10-31-2018 06:00 AM

Hi Paul,

Other clients are okay. Opening a job in DIS is considered sluggish but not alarmingly so.

We have covered the other aspects you mention like DNS and anti-virus. Nothing to be gained there.

Tha pathj we are now floowing is the number of projecty repsitories. As part of our AD synchronisation code we also create a project repository for every new user. This seems to play a big role in startup times of both the server itself (which makes perfect sense) but also SASMC. In a DEV domain I have used a script to create an arbitrary number of project repositories and have times startup of sasmc as a function of that number. And lo and behold, the correlation is strong. 10 repositories results in a 40 seconds startup time. This linerarly scales up to 5min15 for 200 repositories. In reality we have 350! All as sasadm@saspw.

So yeah, we are up to something here.

I am now running a single node (startNoCluster) and trace logging turned on. I will see what's happening, especially for unauthorized users who have no business in other's project repos.

More to follow.

Thanks,

-- Jan.

nhvdwalt · Posted 10-31-2018 06:19 AM

When last did you do a metadata analyse/repair ?

jklaverstijn · Posted 10-31-2018 08:47 AM

We do that on a regular basis. Not scheduled as we do not have scheduled down times, but whenever we're down for other reasons. Last time was a week ago when we ran some reuqired security patches for Linux.

Other than that we would give an arm and a leg for on-line analyse&repair. Down time is a bit of a no-go; that's why we have a cluster in the first place.

-- Jan

jklaverstijn · Posted 10-31-2018 08:55 AM

Fitst of all thanks for all the valuable input. It did help with eliminating much of the environmental contributors to metadata headaches. The plot is thickening however! As said I have created 200 project repositories and there sasmc and/or mds seem to scale poorly. I have set logging to trace (especially the App.IOM logger) and see that every one of those 200 repositories is queried 25 times! So 5000 lines like this before sasmc even becomes responsive:

outMetadata=<RepositoryBase ID="A0000001.blah" ... Name="testreposxxxxx" ... >...</RepositoryBase>

It is also clear fro those logs that sasmc is querying for this 25 times for each project repos. Seems crazy to me.

I think we have enough for a call to tech support.

I'll keep y'all posted.

Cheers, Jan.

Administration and Deployment

Abundant network traffic between metadata server and management console

Re: Abundant network traffic between metadata server and management console

Re: Abundant network traffic between metadata server and management console

Re: Abundant network traffic between metadata server and management console

Re: Abundant network traffic between metadata server and management console

Re: Abundant network traffic between metadata server and management console

Re: Abundant network traffic between metadata server and management console

Re: Abundant network traffic between metadata server and management console

Re: Abundant network traffic between metadata server and management console

Re: Abundant network traffic between metadata server and management console

Re: Abundant network traffic between metadata server and management console

Re: Abundant network traffic between metadata server and management console

Re: Abundant network traffic between metadata server and management console

Re: Abundant network traffic between metadata server and management console

Re: Abundant network traffic between metadata server and management console

ML Jokers - Network Traffic Forecasting

Shaping Network Traffic for CAS in SAS Viya 3.5

High Availability with SAS Grid Manager

Hackanadians: Traffic Lights for Life - audio-based intersection manag...

SAS Viya – CAS Server Life Cycle Management

Follow Us

What is...