connectivity issues with EG2 and EG4 - Page 2

deleted_user · Posted 03-13-2008 09:13 AM

As Chris@SAS said I too am impressed by the brainpower in this tread. If anyone can solve the problem you guys can. But since I am a research psychologist by training who found himself in the role of SAS Administrator I have to admit that I am having difficulty following you and applying what has been said to solving my disconnect problem.

From what I gather you all agree that EG only communicates to the SAS server under certain conditions and is running on a workstation only, thus it is not responsible for severing the session with the server. You all seem to agree that there is something on the server that is causing the disconnect and making me lose all of my session output. Is that correct? By the way is anyone else out there having this disconnect problem?

Much of your conversation though seems to be centered around solutions in a UNIX environment. I'm am working on a Windows server, so I'm am trying to get solutions for that environment. I read that Joshua disabled the firewall and the problem seemed to have been resolved for him. My company's IT folks say that while there is an active firewall between the server and the outside world, the firewall that is between the server and my client is already disabled. Is there something else there related to firewalls that is being missed?

Chuck mentioned that it is a common security practice to put a timeout on server account. However the IT folks here showed me that they do not have these timeouts active. On the server, under "Computer Management" and under "Session Properties" the "Idle session limit" and all other limits are set to "Never." It seemed pretty clear cut. Is there something being missed?

Any help would be valued. Thanks.

deleted_user · Posted 03-13-2008 09:49 AM

Theres a simple way to prove that the firewall isnt part of the problem. Run one EG session connecting to a server in the same subnet, and another connecting across the firewall to another server. With the right controls and careful observation, I think it would be easy to infer how and why EG performs the way it is (after a few iterations), even if its a black box to us.

Given that many of us are familiar with OR, I find it ironic that as mathematicans we spend effort trying to optimise business decisions and mathematical computations, yet end up troubled by system and application layer bottlenecks.
-
I suspect the way EG connects to windows and to unix differs. I am not sure why that is the case, but I think that the login session could be disconnected by the client, the server and by the network layer. So there are 3 segments we'll need to look into Message was edited by: Joshua

deleted_user · Posted 03-13-2008 10:21 AM

Ok, different bias for a new audience (not Joshua).

Are you sure it is timing out?

Start up EG, new project, select a large dataset on a remote server so that it is brought into the project. Wait a long time (after the suspected time out) and see if you can scroll through the dataset. DO NOT scroll through the dataset on initially bringing it into the project, as EG caches some of the information, and so could give a false indication. Also, after the long wait, and after the scrolling test, can you select another data set?

deleted_user · Posted 03-13-2008 01:50 PM

Chuck, I tried your test of bringing in a dataset, waiting until the disconnect, and seeing if I could scroll through the dataset.

After the disconnection I could not scroll through the dataset. The error was "The object invoked has disconnected from the client." I could not call a new dataset in either. Error included "A call to the metadata repository has failed."

That seems to be a timing out issue. Do you have another idea in mind?

deleted_user · Posted 03-13-2008 03:07 PM

I am running you through some troubleshooting.

Ok, good, that confirms that there is definitely a timeout issue.
I didn't want to assume, because for all I knew, you were running code that was aborting/abending or hitting an endsas statement.

Next step, when the OS guys showed you the timeout values being set to "never" was that on the server that holds the metadata repository, or the application/SAS server or both? Or, is SAS and the metadata repository on the same server?

deleted_user · Posted 03-13-2008 03:32 PM

Chuck, SAS and the metadata repository on the same server at this point. We plan on changing that in the near future but for now its the same.

By the way, I just returned from a meeting with my IT folks. Initially I understood that there was NO firewall between my client and the server. However, after doing some checking they reported today that there is in fact a firewall, and that it is set to disconnect after 1 hour. And that seems to be the period of time until my session disconnects. Still they are not convinced that this is causing the problem however. They feel that EG should be continually sending messages to the server that it is logged in. That does make sense to me, but for some reason I don't think EG was designed that way. I'm not sure why. They say that they cannot change the 1 hour firewall for me without it changing it for everyone in the company. Even if they could I can't imagine that would be the wise thing to do as Joshua stated earlier.

If it is the firewall causing the issue why isn't it affecting every company out there who is using EG and a firewall? Are we not doing our configuration for the firewall and server in an optimal manner? Maybe it's not the firewall but it sure does seem that way.

Anyway any other troubleshooting tips would be helpful.

deleted_user · Posted 03-13-2008 04:02 PM

Aha! See!?
The IT people were unintentionally lieing to you.

I am an IT person, have been for years (decades?).
They are still lieing to you out of laziness.

1) Most installations don't have a firewall between the workstations and the SAS servers

2) They are ignoring how SAS and EG works, and unwilling to learn, thus giving you grief.

3) If this is a real professional firewall, not some home Netgear, or Linksys, etc. or Windows software thing, then it is also a router and it is possible to put in exception rules for specific IP addresses.

So, is the problem that you are working from home?
Why is there a firewall between the workstations and the server?

deleted_user · Posted 03-13-2008 08:22 PM

I am inclined to agree. There is alot happening at the network layer (and a network packet dump shows this. But as you have said, the communication is not merely at an application layer, it also routes through the operating system and the common underlying network infrastructure.

I'd want to check the end points of the EG session (the EG client and the SAS server, where the workspace session runs on top of a local shell session).

I'd check /etc/default/login, /etc/pam.conf and "last" to see if the shell session spawned on the unix SAS IT box is restricted by local login policies. For microsoft, I'd suppose you could check how the sessions is restricted by local login policies defined in "secpol.msc".
If the box logs the duration of each remote login session, we can verify that login sessions can and do last more than a typical length.

I am however curious about another two aspects which may have an impact on the EG session: the user's metadata defined at the metadata tier, and how the server tier communicates to the data tier. It would be great if there were a schematic workflow diagram for a full EG session, so that we could logically investigate each segment of the flow for misconfigurations (I'll have to read through the little sas book for EG4.1 again before this workflow is more apparent to me)

ChrisHemedinger · Posted 03-13-2008 09:13 PM

As you have already discerned (I think), there are two connections at work here: EG to metadata and EG to workspace.

When a workspace is chugging away processing a job for EG, EG pesters the workspace server every so often (let's say 30 seconds) to ask: "got some SAS log for me? How about now? Okay, how about now?" (This reminds me of my 5-year-old asking to watch television.)

But EG never talks back to the metadata server, because we've got all of the metadata we need for the moment, thank you very much. After an hour, your connection takes this inattention personally and hangs up.

Now we get to the crux of the problem: EG 4.1 cannot recover from this very well. Eventually EG will need metadata information again -- your session cannot continue without it. But EG is not trained to reestablish this connection in mid-session.

This is a limitation that we have fixed in 4.2. We did not add keep-alive logic, but we did add the ability to reconnect to metadata. Unlike a workspace session, the metadata session is stateless and an interruption like this should have no adverse affects, providing EG can successfully reconnect.

Hopefully, the information shared in this thread (by everyone) will provide some ideas for workarounds in the meantime.

Chris

Learn from the Experts! Check out the huge catalog of free sessions in the Ask the Expert webinar series.

deleted_user · Posted 03-14-2008 01:41 AM

Hi Chris, thanks for the info. Is there a knowledge base or white paper available on this? If there are some online references on the subject, I think it would benefit all of us greatly. If we are on EG4.1, is there something that can be applied on the EG client to sustain the metadata communication?

I'd wonder (a) is this something that occurs regardless of whether the metadata tier and server tier are on the same box (b) is this something time and network based, meaning that it tends to occur when EG has run for a certain amount of time, or when the traffic tranverses certain network (c) is this problem something that only exists on EG4.1, because EG2 users dont seem to notice similar time outs Message was edited by: Joshua

ChrisHemedinger · Posted 03-14-2008 08:18 AM

One big difference: EG 2 does not use the SAS Metadata Server. It uses an EG-specific repository that predates the SAS Metadata Server.

If you have a central EG repository, the communication is via Windows DCOM which may not be subject to the same timeout issues.

The EG admin guide, linked from http://support.sas.com/eguide, might help if you haven't already dug into it.

Chris

Learn from the Experts! Check out the huge catalog of free sessions in the Ask the Expert webinar series.

deleted_user · Posted 03-14-2008 03:04 PM

Being new to sas, and only exposed to only one version of sas and two versions of EG, I had assumed that the EG client was merely a thin client, so all the business logic is encapsulated at the server tier and metadata tier. Should I then say that all builds of SAS allow all builds of EG to connect, but SAS (the server and metadata tier) acts differently based on the version of EG that is connecting? Because I was told by someone that the metadata is basically a sas data set cached on the server. I wonder if theres a clear work flow of how EG connects to SAS (I admit I should re-read the documentation, but I havent seen anything that explicitly describes the workflow in the little sas book for EG)

If so, is there a way to tweak or even see the configurations at the metadata layer? I tried grepping for .cfg files in vain, so I can only assume its not as simple as reading a text file.

thanks for your advice in advance

-Josh

deleted_user · Posted 03-14-2008 04:48 PM

EG is either not a thin client, or not your normal thin client.

Thin clients are typically web enabled applications that access an underlying application server. The web server provides the user interface, and some light application responses. The application server does the next layer of heavier lifting, and may/will use a lower level server to do the real heavy lifting/processing.

EG is more like a thick or middle weight client.

A classic thick client would have been written in Visual Basic or PowerBuilder, where these applications ran on the workstation, provided the graphical user interface, and connected to underlying databases.

EG is written in ??, I guess within the .net framework (I had though it Java, and wish it were). It interacts with a metadata server or at least a metadata repository, and sends work tasks to an underlying SAS server, wherever it may be.

A SAS server is not the same thing as an Oracle server or a WebSphere server. SAS is more like Java, than those other things, but also not like Java.

Java is a programming language, like C and COBOL and ForTran are programming languages.
Java unlike other traditional programming languages, also defines an underlying standardized virtual machine. Java is compiled into the JVM's "machine" byte code (b-code), which is similar to an old compiler technique of compiling into p-code to provide transportability across platforms and compiled languages, but it is not the same.
The JVM either interprets the b-code, which is slow and inefficient, or just-in-time (JIT) compiles the b-code into the native machine code of the underlying "physical processor".
The command to run a java program is "java ... something.java" (somthing.class?) or "java ... something.jar", where the referenced file contains a class that contains the "main" method.

SAS is a programming language.
SAS is a statistical analysis system
SAS is ....
The command to run a sas program is "sas program.sas" or "sas program ", or some such form.
Just typing in "sas" on a windows or Unix box, will bring up an interactive SAS session, which is similar to an IDE.
The "sas" program is actually/essentially, a JIT compiler that takes the text SAS program, compiles and executes it in blocks at a time, per some well defined rules as to what a "block" is. This is not a server. It is not a virtual machine, per se.

If SAS is installed on a host box, and many people are able to use that SAS installation at the same time, then that box is referred to as a SAS server. If I have SAS installed on a mainframe, I can log into the mainframe through tso/ispf and then submit SAS jobs to do data processing, job written in the SAS programming language. If SAS is installed on a Unix "server", I can log into that box via telnet and run "sas ..." jobs on that box.

If SAS/CONNECT is installed on both the Unix box and the mainframe, I can submit a job on either "server" to remotely process SAS code on the other server via the signon, rsubmit and endrsubmit statements. SAS/CONNECT contains SAS/IT = Integration Technologies, which provide the inter-server connections/communications.

SAS EG is what I would call a thick client that requires the installation of SAS/IT to connect to a remote server that has SAS installed on it -- aka the "SAS Server". Part of SAS/IT is a process/program called a "spawner". The spawner is the thing that actually submits the "sas ..." command, which then executes SAS code.

Part of EG is a metadata repository, which incorporates the metadata concept of a Logical SAS Server, a named thing that references a physical box that has SAS installed on it. You can have multiple logical "servers" which point to the same box, but have differing user/application logins. The function of the metadata repository can be subplanted by SAS's SAS MetatData Server, which has much richer capabilities than the simple metadata repository.

EG essentially composes SAS code (a GUI based code generator) and ships it off to a "server" which is simply a started interactive SAS session, with that session's display capabilities turned off. We have been told by Chris@SAS that EG then polls the "server" for log file information while a task is running on that server. If all tasks are complete, then the remote SAS session is sitting idle, and EG is sitting idle, except to respond to user interface stuff. But, if an intervening network device can timeout the network connection between the EG workstation and the host that SAS is running on, due to lack of activity, then perhaps EG isn't polling, but the "server" is simply sending back log messages to the EG workstation as an ODS destination. I know from my own experience that if a data step takes an hour to run, the log entry for that step doesn't happen for that hour.

Is this interaction clear?
Does it make sense?
Am I correct? Message was edited by: Chuck

deleted_user · Posted 03-17-2008 07:31 PM

Chris@SAS thanks for your valuable info. I am currently at the SAS conference in San Antonio and one of the developers at the EG station in the Demo Room confirmed what you said.

In his opinion the disconnect issue with EG is directly related to the firewall configuration we have. The SAS developer says it is unusual for us to have a firewall set between the client and the server, that is why most other SAS customers don’t have this problem. However, he has seen this configuration in a few other places which resulted in the same disconnect issue.

EG communicates through two ports. One is with the Workspace Server where our process runs. There is constant communication as EG is always asking the Workspace Server if it has results to send back and display in EG. The other port is where EG communicates with the Metadata server primarily used for credentialing and other tasks. There is no need for EG to constantly communicate with the Metadata Server except when there is something that needs to be done. Because of this, a firewall that times out after an hour or so will sever the connection with the Metadata Sever. There will be no way for EG to recover. This is exactly what my symptoms are. EG’s connection specifically with the Metadata Server is severed after 45-60 minutes. Thus, all indications are that it is the firewall that is the issue. Of course there could always possibly be other explanations, but in this case nearly all available data points to the same conclusion.

The SAS developer has successfully solved the firewall issue on the occasions when he has encountered it. One method is to adapt a different configuration that does not involve a firewall between the client and the server. The other method (which probably is preferred in our situation) is to adjust the firewall to open up the “specific port” in which clients communicates with the Metadata Sever.

The Developer also confirmed that in the release of EG 4.2, the application will be able to successfully reestablish it's link to the Metadata Sever. Thus, the problem will not likely be an issue then.

I will return to the office and try to make all this work. Wish me luck and thanks to everyone's input.

deleted_user · Posted 03-17-2008 08:36 PM

Thanks DrChris. If that is the case, then its all a matter of changing the firewall rules. I disagree that EG and the SAS workspace server are typically in the same subnet, because for many production environments, the end users and the servers may exist in two different locales, or the production servers may be colocated in a hosted environment. What the EG developer says makes sense, but when we spoke to SAS some time back, we were given the impression that only one port needed to be open for EG to launch a connection to the SAS server. I suppose I'll have to read through the documentation to find out more references on the subject

-Josh

Re: connectivity issues with EG2 and EG4

Re: connectivity issues with EG2 and EG4

Re: connectivity issues with EG2 and EG4

Re: connectivity issues with EG2 and EG4

Re: connectivity issues with EG2 and EG4

Re: connectivity issues with EG2 and EG4

Re: connectivity issues with EG2 and EG4

Re: connectivity issues with EG2 and EG4

Re: connectivity issues with EG2 and EG4

Re: connectivity issues with EG2 and EG4

Re: connectivity issues with EG2 and EG4

Re: connectivity issues with EG2 and EG4

Re: connectivity issues with EG2 and EG4

Re: connectivity issues with EG2 and EG4

Re: connectivity issues with EG2 and EG4

SAS Innovate 2025: Call for Content

Classroom Training Available!