We're having a strange problem with EG 7.1, and SAS 9.4 M2 with grid. Our users that lauch grid jobs through SAS EG will occasionally get the following error, and their grid job will fail. We're using grid launched workspace servers, and we have no issues with that. It's almost like EG loses it's connection to grid. So far I've not been able to find anything helpful on this anywhere. Any ideas?
NOTE: Remote session ID WK0 will use the grid service _ALL_.
NOTE: Remote signon to WK0 commencing (SAS Release 9.04.01M2P072314).
ERROR: A communication subsystem partner link setup request failure has occurred.
ERROR: Could not start grid job or grid job failed.
ERROR: Remote signon to WK0 canceled.
ERROR: A link must be established by executing the SIGNON command before you can communicate with WK0.
ERROR: A link must be established by executing the SIGNON command before you can communicate with WK0.
ERROR: A link must be established by executing the SIGNON command before you can communicate with WK0.
ERROR: A link must be established by executing the SIGNON command before you can communicate with WK0.
MPRINT(WKLY): RSUBMIT CONNECTWAIT=NO CONNECTPERSIST=NO;
ERROR: A link must be established by executing the SIGNON command before you can communicate with WK0.
NOTE: Subsequent lines will be ignored until ENDRSUBMIT.
The "Remote signon to WK0 canceled" is the interesting part. Normally instead of WK0, it would have a hostname (gridserver03.local or something).
Just in case anyone circles back to this topic, we basically found the same thing. It's some kind of issue with the Object Spawner. We found that our analysts could run their jobs right after the Object Spawner was restarted, but after a few days they'd start running in to this issue. You're probably going to assume some risk in reloading the Object Spawner, so proceed with some caution.
The object spawners are fine. Not sure what you mean by connect server, but lsf/grid seem fine. We have several hundred users using EG/grid right now with no problems.
under /sas/config/LevN/ directory, do you see connect spawner, can you check, if it is started?
Under server manager SAS MC, under SASAPP application server , do you see connect server, are you able to validate it?
Hi,
are your EG clients working on Citrix/virtualized clients? I can easily imagine that you have a firewall (F5?) blocking eventually some connections.
You might want to check your Windows Logs, or extend the logging to a DEBUG level.
Best regards,
Juan
Juan, I don't think Grid will work, if connect server is not working. It could be firewall issue or it could be that connect spawener is not started.
If by connect spawner you mean Object Spawner, then yes it's started.
No, I don't mean object spawner by connect spawner.
could you expand server manager in smc and provide snapshot.? What is the operating system sas services are running on?
Do you see ConnectSpawner directory inside LevN ?
Hi 🙂
Did you solve your problem?
We have the same with our connect.
Beside the errors in the EG-Job-Log we found several errors in the ObjectSpawner-Logs. This List is a collection of them.
The appear in all of our 4 environments and on every App-Server-Context
We have 4 Grid Computing Servers and 4 Grid Servers on which Metadata-Server runs. Our SAS-Servion is SAS9.4 TS1M3
svcsasit1 - The specified uuid 1CBF12F1-9C22-9E40-B0D2-D481F0758D1E did not match any process managed by this spawner. |
sy071 - The launch of server SASITRM - Workspace Server for user svcsastuit failed. |
svcsasit1 - /gpfs/sasconf/sasit/SASHome/SASFoundation/9.4/sasexe/tkiomsvc.so(tktracex+0x2e) [0x7f79babc3e3e] |
svcsasit1 - Load Balancing interface call failed with exception <?xml version="1.0" ?><Exceptions><Exception><SASMessage severity="Error">(A51P1HY1.AZ00000C_!A51P1HY1.AY000004_@sap00782.lan.de) cannot be found in the metadata.</SASMessage></Exception></Exceptions> |
svcsasit1 - The load balancing processor could not send update to peer (A51P1HY1.AY000004_@sap00783.lan.de) |
svcsasit1 - New client connection (64) rejected from server port 17591 for user sy053@!*(generatedpassworddomain)*!. Peer IP address and port are [::ffff:10.132.137.127]:55632 for APPNAME=SAS Enterprise Guide. |
The credentials specified for the SASBIIB - Pooled Workspace Server (A5QFW5AJ.AY00000B) server definition failed to authenticate. Therefore this server definition will not be included. |
The SASBIIB - Logical Pooled Workspace Server (A5QFW5AJ.AW000006) cluster does not contain any valid server definitions. Therefore this cluster definition will not be included. |
Load Balancing interface call failed with exception <?xml version="1.0" ?><Exceptions><Exception><SASMessage severity="Error">(A5QFW5AJ.AZ00000C_!A5QFW5AJ.AY000004_@sap00782.lan.de) cannot be found in the metadata.</SASMessage></Exception></Exceptions>. |
The load balancing processor could not send update to peer (A5QFW5AJ.AY000004_@sap00783.lan.de) |
Hi,
there has been a lot of great input. Let me try to summarize and add some info, so you can go through this list and verify. This might help to narrow down the problem a bit further:
1) In SASMC, go to Server Manager, Connect Server and Connect Spawner, right click, validate or connect. You should get
a prompt to enter a user ID. If you do have the pwd, enter the user gurramr.
If not, then enter a user ID that is associated with an OS account.
How does that work? Successful or error?
Note: After each server validation, go to FILE and do a CLEAR CREDENTIAL CASH. Even tho you might use the same user ID,
I'd like to make sure you enter "fresh" credentials for each validation.
2) Is "gurramr" a regular user who can work in EG w/out problem outside the Grid? Is this the only user experiencing problems?
3) Do you have any error messages in the Object Spawner log and Metadata Server log?
4) Does the problem occur for all users, or, only certain ones. Does it happen in one environment but not there other, and if so,
what is the difference between the environments.
5) Could you please confirm whether this problem occurs sporadically, or, on regular basis
6) When this problem occurs, have any servers/services been restarted/paused and resumed, right before this problem occurs?
Hopefully the answers will help us to further assist you.
Thanks
Anja
I had the same problem in my environment and it got solved after adding more roles and capabilities in metadata thrugh SMC.
earlier I have only SAS Admin role assigned since I am an admin, then I added one of the business group to the user and it worked fine.
I resolved by bouncing the ObjectSpawners. I knew this would be the resolution but it's something I'm reluctant to do with 140 connections running. Only one user was having the issue and I first tried removing and re-adding them with SASMC. I'm maintaining a SAS 9.4M3 Grid.
Just in case anyone circles back to this topic, we basically found the same thing. It's some kind of issue with the Object Spawner. We found that our analysts could run their jobs right after the Object Spawner was restarted, but after a few days they'd start running in to this issue. You're probably going to assume some risk in reloading the Object Spawner, so proceed with some caution.
The SAS Users Group for Administrators (SUGA) is open to all SAS administrators and architects who install, update, manage or maintain a SAS deployment.
SAS technical trainer Erin Winters shows you how to explore assets, create new data discovery agents, schedule data discovery agents, and much more.
Find more tutorials on the SAS Users YouTube channel.