I'm trying to get the LASR Server started (vs 7.3 on REHL). We did have it running once a month or so again when first installed but something's changed since then. I get the error: ERROR: Failed to load analytic extension for the distributed computing environment. I am starting as the user sas, which has SSH (passwordless) keys set up between the main server and 3 distributed servers for VA.
I found this article: http://support.sas.com/kb/60/126.html and was able to verify the SSH works for sas between nodes (/opt/TKGrid/bin/simsh /opt/TKGrid/bin/simsh hostname). At the bottom of the article was info & I confirmed I'm using RSA keys, and keys are stored in $HOME/.ssh. I think this may be true: SSH authentication is performed using GSSAPIAuthentication (based on Kerberos) so I tried the next tip (setting GRIDRSHCOMMAND) - tried in the sasv9_local.cfg. This seemd to hang the start process (rather than return an error message). Any ideas to debug are welcome. Do have a ticket open also.
Update:
The problem has been identified and resolved. Hot Fix has been pushed to SAS 9.4 M5. If you need a fix for SAS 9.4 M3 or M4, please open a track and let me know the number.
Special thanks go to @msjhicks and her colleagues for help with debugging the problem!
Please send an output from following commands. Do not forget to replace <PATH_TO> & <FQDN_OF_LASR_HEAD_NODE>.
export TKPATH=/<PATH_TO>/TKGrid/lib:/<PATH_TO>/TKGrid/bin
export GRIDHOST=<FQDN_OF_LASR_HEAD_NODE>
export GRIDINSTALLLOC=/<PATH_TO>/TKGrid
export GRIDRSHCOMMAND="/usr/bin/ssh -q -o StrictHostKeyChecking=no -o PasswordAuthentication=no -o GSSAPIAuthentication=yes -o GSSAPIDelegateCredentials=yes -o RSAAuthentication=no"
/<PATH_TO>/TKGrid/bin/checknodes /<PATH_TO>/TKGrid/grid.hosts
/<PATH_TO>/TKGrid/bin/tkgridperf
/<PATH_TO>/TKGrid/bin/tkgridmon
I believe I set all the env vars okay (including the GRIDHOST which I didn't include)
env | grep GRID
GRIDRSHCOMMAND=/usr/bin/ssh -q -o StrictHostKeyChecking=no -o PasswordAuthentication=no -o GSSAPIAuthentication=yes -o GSSAPIDntials=yes -o RSAAuthentication=no
GRIDINSTALLLOC=/opt/sasva/TKGrid
env | grep TKPATH
TKPATH=/opt/sasva/TKGrid/lib:/opt/sasva/TKGrid/bin
But I get this when I try to run checknodes I get the following messages (4 times - just showing first "set") and failure with RC 255
/opt/sasva/TKGrid/bin/checknodes /opt/sasva/TKGrid/grid.hosts
unknown option --
usage: ssh [-1246AaCfgKkMNnqsTtVvXxYy] [-b bind_address] [-c cipher_spec]
[-D [bind_address:]port] [-E log_file] [-e escape_char]
[-F configfile] [-I pkcs11] [-i identity_file]
[-L [bind_address:]port:host:hostport] [-l login_name] [-m mac_spec]
[-O ctl_cmd] [-o option] [-p port]
[-Q cipher | cipher-auth | mac | kex | key]
[-R [bind_address:]port:host:hostport] [-S ctl_path] [-W host:port]
[-w local_tun[:remote_tun]] [user@]hostname [command]
The other 2 commands just hang
Janette
Also, had a question after talking to someone here more knowledgeable. In our main server we have in /etc/ssh/sshd_config:
KerberosAuthentication yes
GSSAPIAuthentication yes
GSSAPICleanupCredentials no
We are using kerberos and authenticating back to an LDAP server (using pam and other processes)
However, on our 3 nodes (for VA High Perf. Analytics) we have all Kerberos options commented out but still have:
GSSAPIAuthentication yes
GSSAPICleanupCredentials no
Should that GSSAPIAuthentication on those nodes by "no"?
Thanks.
Janette
The value for GRIDRSHCOMMAND must be in double quotes.
>> Should that GSSAPIAuthentication on those nodes by "no"?
No.
Hi. I did the export as you had typed it - should I do something different? Sorry if I'm missing what you meant. Thank you,
Janette
export GRIDRSHCOMMAND="/usr/bin/ssh -q -o StrictHostKeyChecking=no -o PasswordAuthentication=no -o GSSAPIAuthentication=yes -o GSSAPIDelegateCredentials=yes -o RSAAuthentication=no"
env | grep GRIDRSH
GRIDRSHCOMMAND=/usr/bin/ssh -q -o StrictHostKeyChecking=no -o PasswordAuthentication=no -o GSSAPIAuthentication=yes -o GSSAPIDelegateCredentials=yes -o RSAAuthentication=no
OK, I thought I'd try putting single quotes around so it kept the quotes. This time I get:
/opt/sasva/TKGrid/bin/checknodes /opt/sasva/TKGrid/grid.hosts
Machine busas.binghamton.edu responded with failure. RC: 20
Machine sasproc01.binghamton.edu responded with failure. RC: 20
Machine sasproc02.binghamton.edu responded with failure. RC: 20
Machine sasproc03.binghamton.edu responded with failure. RC: 20
Num Returned: 4
Num Failed: 4
/opt/sasva/TKGrid/bin/tkgridperf
Unable to enumerate grid.
/opt/sasva/TKGrid/bin/tkgridmon
Unable to enumerate grid.
ERROR: Failed to execute command: "/usr/bin/ssh -q -o StrictHostKeyChecking=no -o PasswordAuthentication=no -o GSSAPIAuthenticat
ion=yes -o GSSAPIDelegateCredentials=yes -o RSAAuthentication=no" busas.binghamton.edu export TKMPI_INFO=""; /opt/sasva/TKGrid/t
kmpirsh.sh -np 1 /opt/sasva/TKGrid/tkmpinodelib.sh busas.binghamton.edu 65206 tkegenum
Timeout waiting for Grid connection.
Please export these variables once again and run these commands:
strace -fv -s 1000 -o /tmp/tkgridperf.strace.log /opt/sasva/TKGrid/bin/tkgridperf strace -fv -s 1000 -o /tmp/tkgridmon.strace.log /opt/sasva/TKGrid/bin/tkgridmon
Attach log files for further investigation. Also, make sure that you have an entry for localhost in /etc/hosts on each machine in your TKGrid cluster.
Also, what happens if you run this command?
/usr/bin/ssh -q -o StrictHostKeyChecking=no -o PasswordAuthentication=no -o GSSAPIAuthentication=yes -o GSSAPIDelegateCredentials=yes -o RSAAuthentication=no busas.binghamton.edu export TKMPI_INFO=""; /opt/sasva/TKGrid/tkmpirsh.sh -np 1 /opt/sasva/TKGrid/tkmpinodelib.sh busas.binghamton.edu 65206 tkegenum
Hi.
We do have a localhost entry in /etc/hosts on each machine.
When I run this I get errno 111:
/usr/bin/ssh -q -o StrictHostKeyChecking=no -o PasswordAuthentication=no -o GSSAPIAuthentication=yes -o GSSAPIDelegateCredentials=yes -o RSAAuthentication=no busas.binghamton.edu export TKMPI_INFO=""; /opt/sasva/TKGrid/tkmpirsh.sh -np 1 /opt/sasva/TKGrid/tkmpinodelib.sh busas.binghamton.edu 65206 tkegenum
Failed to connect to 'busas.binghamton.edu', errno: 111
I emailed you privately about attaching the logs - let me know.
Thanks,
Janette
Error 111 means connection refused. Are you sure that SSHD daemon is up and running on busas.binghamton.edu? Send these log files to the track and in the message state that they are for Alex. Also, send /var/log/secure and /var/log/messages from busas.binghamton.edu.
Hello,
We have been following this thread and are experiencing the same results. What was the final resolution?
We are still working on it. To workaround the problem you can remove a dns entry from /etc/resolv.conf.
Thanks. I believe there are a number of entries in this folder, am I looking for something specific?
Also you sent me the SAS ticket number for this, so I can have my customers SAS AE keep tabs on it?
A host entry. I will have another debug session next week, so I will keep everyone posted.
The SAS Users Group for Administrators (SUGA) is open to all SAS administrators and architects who install, update, manage or maintain a SAS deployment.
SAS technical trainer Erin Winters shows you how to explore assets, create new data discovery agents, schedule data discovery agents, and much more.
Find more tutorials on the SAS Users YouTube channel.