Architecting, installing and maintaining your SAS environment

Cannot push job to specific host in grid environment

Accepted Solution Solved
Reply
Super Contributor
Super Contributor
Posts: 410
Accepted Solution

Cannot push job to specific host in grid environment

Have sas 9.4 in grid environment (Linux).

 

When try to run job (in batch mode -> sas test.sas) and try to dispatched it on specific host (amdusa.company.com) using below statement, its not running on amdusa.company.com but on different grid node/server (in batch mode). + none other user's job dispatching on this host.

 

options metaserver=amdusa.company.com metaport=12345 metauser=userid metapass=xxx;

 

amdusa.company.com host is properly defined in lsb.hosts as well as in list of LSF_MASTER_LIST and LSF_SERVER_HOSTS parameters from lsf.conf. Also, "bhosts" command shows "ok" status for host "amdusa.company.com"

 

job runs fine locally with "./sas -nodms" on amdusa.company.com.

 

check things around but nothing looks missing.


Accepted Solutions
Solution
‎04-01-2018 10:59 PM
Trusted Advisor
Posts: 1,737

Re: Cannot push job to specific host in grid environment

Hello @woo,

 

this is a great question right there, and quite interesting. I wonder, do you have a High-Availability (HA) set up in your grid environment?

 

I can perfectly imagine the fact that your Load Balancer (physical or EGO) is believing this host is down and, somehow, bringing the load from this host to another node in the grid. You could check this in EGO, in RTM or your physical load balancer (with your IT guys).

 

Another option, perhaps you would like to check the resources configuration for this host: queue configuration, queue length, jobs that can run, queue status (maybe full), etc.

 

Anyway, it seems as you host does not have a problem "per-se", since you can run sas code in it locally.

However, if you send it as a grid job, the job is being directed to another host... and this is why I would consider as starting point that the problem is either on your HA configuration (EGO, Load Balancer, RTM) or how the node is registered into the grid (bhosts, lshosts, RTM).

View solution in original post


All Replies
Solution
‎04-01-2018 10:59 PM
Trusted Advisor
Posts: 1,737

Re: Cannot push job to specific host in grid environment

Hello @woo,

 

this is a great question right there, and quite interesting. I wonder, do you have a High-Availability (HA) set up in your grid environment?

 

I can perfectly imagine the fact that your Load Balancer (physical or EGO) is believing this host is down and, somehow, bringing the load from this host to another node in the grid. You could check this in EGO, in RTM or your physical load balancer (with your IT guys).

 

Another option, perhaps you would like to check the resources configuration for this host: queue configuration, queue length, jobs that can run, queue status (maybe full), etc.

 

Anyway, it seems as you host does not have a problem "per-se", since you can run sas code in it locally.

However, if you send it as a grid job, the job is being directed to another host... and this is why I would consider as starting point that the problem is either on your HA configuration (EGO, Load Balancer, RTM) or how the node is registered into the grid (bhosts, lshosts, RTM).

Super Contributor
Super Contributor
Posts: 410

Re: Cannot push job to specific host in grid environment

Posted in reply to JuanS_OCS

Thanks a lot Juan, we are looking into it. and not sure what you mean HA available in environment but we have grid environment with like 18 to 20 servers with one metadata server. we do not have any auto failover or any stand-by server if metadata fails. we manually troubleshoot and bring it up. Thanks -

Trusted Advisor
Posts: 1,737

Re: Cannot push job to specific host in grid environment

Hi again @woo,

 

hmmm, is  amdusa.company.com a metadata server but also a GRID slave node or master node? 

Super Contributor
Super Contributor
Posts: 410

Re: Cannot push job to specific host in grid environment

Posted in reply to JuanS_OCS

We came across one script under our lsf directory structure which has one environment variable defined which has couple different values (some queues name). There was one "if" statement which defined where job should goes based on user's bash profile. We created new queue for just that specific host and tried to see if job runs on that specific host and it worked fine. We put original script back in place and put that host back in server master list and it started working fine. 

 

So it could be possible something went wrong when we took that host out from master server list and put it back for maintenance purpose. But now everything seems normal. Thanks for your help...appreciate your time. 

Valued Guide
Posts: 531

Re: Cannot push job to specific host in grid environment

Notwithstanding the fact that you already have a solution I'd like to share my 2 cents and explain how we go about similar tasks.

 

We have a job flow that needs inordinate amounts of SASWORK. We have multiple grid compute nodes. Most have lots of SASWORK with insane speed but one has twice that at ludicrous speed. We direct the jobs to that server by defining a resource called LargeWork in LSF  and configuring that host as providing that resource. Then when scheduling a job you can add the required resource LargeWork to the schedule definition and Process manager will always direct that job to that specific host.

 

Resources are defined in the file lsb.shared. In lsf.cluster.cluster_name you add the resource name to the desired host(s). Do remember to restart the daemons with badmin reconfig and lsadmin reconfig commands.

 

And to second Juan's observation: do youreally  have a metadata server that doubles as a grid compute node?

 

And as far as your metadata server being a single point of failure: have a look at clustering it over multiple hosts. It was a life saver for us many times.

 

Regards,

- Jan.

Super Contributor
Super Contributor
Posts: 410

Re: Cannot push job to specific host in grid environment

Posted in reply to jklaverstijn

Thanks Jan, appreciate your input. It makes perfectly sense. 

☑ This topic is solved.

Need further help from the community? Please ask a new question.

Discussion stats
  • 6 replies
  • 308 views
  • 6 likes
  • 3 in conversation