Solved: Enterprise Guide 7.1 fails to submit grid job

tim_acton · Posted 05-11-2016 09:36 AM

We're having a strange problem with EG 7.1, and SAS 9.4 M2 with grid. Our users that lauch grid jobs through SAS EG will occasionally get the following error, and their grid job will fail. We're using grid launched workspace servers, and we have no issues with that. It's almost like EG loses it's connection to grid. So far I've not been able to find anything helpful on this anywhere. Any ideas?

NOTE: Remote session ID WK0 will use the grid service _ALL_.

NOTE: Remote signon to WK0 commencing (SAS Release 9.04.01M2P072314).

ERROR: A communication subsystem partner link setup request failure has occurred.

ERROR: Could not start grid job or grid job failed.

ERROR: Remote signon to WK0 canceled.

ERROR: A link must be established by executing the SIGNON command before you can communicate with WK0.

MPRINT(WKLY): RSUBMIT CONNECTWAIT=NO CONNECTPERSIST=NO;

ERROR: A link must be established by executing the SIGNON command before you can communicate with WK0.

NOTE: Subsequent lines will be ignored until ENDRSUBMIT.

The "Remote signon to WK0 canceled" is the interesting part. Normally instead of WK0, it would have a hostname (gridserver03.local or something).

tim_acton · Posted 05-31-2017 10:18 AM

Just in case anyone circles back to this topic, we basically found the same thing. It's some kind of issue with the Object Spawner. We found that our analysts could run their jobs right after the Object Spawner was restarted, but after a few days they'd start running in to this issue. You're probably going to assume some risk in reloading the Object Spawner, so proceed with some caution.

View solution in original post

GyaniBaba · Posted 05-11-2016 11:28 AM

Are you able to valodate Connect Server from SAS MC?
Is connect spawner up and running fine?

tim_acton · Posted 05-11-2016 12:23 PM

The object spawners are fine. Not sure what you mean by connect server, but lsf/grid seem fine. We have several hundred users using EG/grid right now with no problems.

GyaniBaba · Posted 05-11-2016 12:32 PM

under /sas/config/LevN/ directory, do you see connect spawner, can you check, if it is started?

Under server manager SAS MC, under SASAPP application server , do you see connect server, are you able to validate it?

JuanS_OCS · Posted 05-12-2016 04:34 AM

Hi,

are your EG clients working on Citrix/virtualized clients? I can easily imagine that you have a firewall (F5?) blocking eventually some connections.

You might want to check your Windows Logs, or extend the logging to a DEBUG level.

Best regards,

Juan

GyaniBaba · Posted 05-12-2016 05:56 AM

Juan, I don't think Grid will work, if connect server is not working. It could be firewall issue or it could be that connect spawener is not started.

tim_acton · Posted 05-12-2016 09:51 AM

If by connect spawner you mean Object Spawner, then yes it's started.

GyaniBaba · Posted 05-12-2016 10:27 AM

No, I don't mean object spawner by connect spawner.

could you expand server manager in smc and provide snapshot.? What is the operating system sas services are running on?

Do you see ConnectSpawner directory inside LevN ?

DanielKaiser · Posted 05-31-2016 03:16 AM

Hi 🙂
Did you solve your problem?

We have the same with our connect.

Beside the errors in the EG-Job-Log we found several errors in the ObjectSpawner-Logs. This List is a collection of them.
The appear in all of our 4 environments and on every App-Server-Context

We have 4 Grid Computing Servers and 4 Grid Servers on which Metadata-Server runs. Our SAS-Servion is SAS9.4 TS1M3

svcsasit1 - The specified uuid 1CBF12F1-9C22-9E40-B0D2-D481F0758D1E did not match any process managed by this spawner.

sy071 - The launch of server SASITRM - Workspace Server for user svcsastuit failed.

svcsasit1 - /gpfs/sasconf/sasit/SASHome/SASFoundation/9.4/sasexe/tkiomsvc.so(tktracex+0x2e) [0x7f79babc3e3e]

svcsasit1 - Load Balancing interface call failed with exception <?xml version="1.0" ?><Exceptions><Exception><SASMessage severity="Error">(A51P1HY1.AZ00000C_!A51P1HY1.AY000004_@sap00782.lan.de) cannot be found in the metadata.</SASMessage></Exception></Exceptions>

svcsasit1 - The load balancing processor could not send update to peer (A51P1HY1.AY000004_@sap00783.lan.de)

svcsasit1 - New client connection (64) rejected from server port 17591 for user sy053@!*(generatedpassworddomain)*!. Peer IP address and port are [::ffff:10.132.137.127]:55632 for APPNAME=SAS Enterprise Guide.

The credentials specified for the SASBIIB - Pooled Workspace Server (A5QFW5AJ.AY00000B) server definition failed to authenticate. Therefore this server definition will not be included.

The SASBIIB - Logical Pooled Workspace Server (A5QFW5AJ.AW000006) cluster does not contain any valid server definitions. Therefore this cluster definition will not be included.

Load Balancing interface call failed with exception <?xml version="1.0" ?><Exceptions><Exception><SASMessage severity="Error">(A5QFW5AJ.AZ00000C_!A5QFW5AJ.AY000004_@sap00782.lan.de) cannot be found in the metadata.</SASMessage></Exception></Exceptions>.

The load balancing processor could not send update to peer (A5QFW5AJ.AY000004_@sap00783.lan.de)

rg · Posted 10-26-2016 05:19 PM

Hi All,

Any update on this issue?

I have a similar issue but however i am able to validate connect server and i am not able to validate the workspace server:
[10/26/16 2:18 PM] INFO: Starting extended validation for Workspace server (level 1) - Making a connection
[10/26/16 2:19 PM] SEVERE: Unknown error in grid provider module.
[10/26/16 2:19 PM] SEVERE: The launch of server SASApp - Workspace Server for user gurramr failed.
[10/26/16 2:19 PM] SEVERE: Failed to start the server.
[10/26/16 2:19 PM] SEVERE: The application could not connect to any server in the cluster "(lexvpsas02:8591,lexvpsas03:8591,lexvpsas05:8591,lexvpsas04:8591)".

anja · Posted 10-27-2016 12:44 PM

Hi,

there has been a lot of great input. Let me try to summarize and add some info, so you can go through this list and verify. This might help to narrow down the problem a bit further:

1) In SASMC, go to Server Manager, Connect Server and Connect Spawner, right click, validate or connect. You should get

a prompt to enter a user ID. If you do have the pwd, enter the user gurramr.
If not, then enter a user ID that is associated with an OS account.

How does that work? Successful or error?

Note: After each server validation, go to FILE and do a CLEAR CREDENTIAL CASH. Even tho you might use the same user ID,

I'd like to make sure you enter "fresh" credentials for each validation.

2) Is "gurramr" a regular user who can work in EG w/out problem outside the Grid? Is this the only user experiencing problems?

3) Do you have any error messages in the Object Spawner log and Metadata Server log?

4) Does the problem occur for all users, or, only certain ones. Does it happen in one environment but not there other, and if so,

what is the difference between the environments.

5) Could you please confirm whether this problem occurs sporadically, or, on regular basis

6) When this problem occurs, have any servers/services been restarted/paused and resumed, right before this problem occurs?

Hopefully the answers will help us to further assist you.

Thanks

Anja

rg · Posted 10-27-2016 01:03 PM

Hi All,

Gurramr is a regular id which had no problems before what so ever. I overcame this problem by just re-bouncing my Object spawners. The only change that was done before this issue arised was adding a macro to the autoexec file.

MadhuKiran1 · Posted 10-27-2016 03:33 AM

I had the same problem in my environment and it got solved after adding more roles and capabilities in metadata thrugh SMC.

earlier I have only SAS Admin role assigned since I am an admin, then I added one of the business group to the user and it worked fine.

kevind · Posted 02-24-2017 01:05 PM

I resolved by bouncing the ObjectSpawners. I knew this would be the resolution but it's something I'm reluctant to do with 140 connections running. Only one user was having the issue and I first tried removing and re-adding them with SASMC. I'm maintaining a SAS 9.4M3 Grid.

tim_acton · Posted 05-31-2017 10:18 AM

Just in case anyone circles back to this topic, we basically found the same thing. It's some kind of issue with the Object Spawner. We found that our analysts could run their jobs right after the Object Spawner was restarted, but after a few days they'd start running in to this issue. You're probably going to assume some risk in reloading the Object Spawner, so proceed with some caution.

Enterprise Guide 7.1 fails to submit grid job

Re: Enterprise Guide 7.1 fails to submit grid job

Re: Enterprise Guide 7.1 fails to submit grid job

Re: Enterprise Guide 7.1 fails to submit grid job

Re: Enterprise Guide 7.1 fails to submit grid job

Re: Enterprise Guide 7.1 fails to submit grid job

Re: Enterprise Guide 7.1 fails to submit grid job

Re: Enterprise Guide 7.1 fails to submit grid job

Re: Enterprise Guide 7.1 fails to submit grid job

Re: Enterprise Guide 7.1 fails to submit grid job

Re: Enterprise Guide 7.1 fails to submit grid job

Re: Enterprise Guide 7.1 fails to submit grid job

Re: Enterprise Guide 7.1 fails to submit grid job

Re: Enterprise Guide 7.1 fails to submit grid job

Re: Enterprise Guide 7.1 fails to submit grid job

Re: Enterprise Guide 7.1 fails to submit grid job