Solved: SAS Grid on Cloud

ankitd · Posted 05-06-2019 10:30 AM

Hello Experts,

Reaching out to you for your help and insights on Cloud based SAS Grid.

For SAS 9.4 Grid environment, if there is a below requirement:

- Scheduling and automating scaling of SAS Grid nodes.

- Implications of doing so on daily basis.

- What are the key pointers that one should ensure on both Infra and SAS Grid application front.

- Any known issues or precautions you would have taken in your environment.

Regards,

Ankit.

doug_sas · Posted 05-06-2019 12:07 PM

There are a couple ways of doing this in SAS Workload Orchestrator both using time-based configuration. You can change the queue configuration based on time to change the list of hosts that can be used for a certain period of time. You can change the host type configuration based on time to set the maximum number of jobs on a host type to 0 during a certain period of time. Create one host type for hosts you want to keep open and one host type for hosts that you want to close. SAS Workload Orchestrator has no facility to kill jobs that are still running outside of a time window.

If you are using LSF, you could change the queue configuration to eliminate hosts for the period of time you want. LSF also has DISPATCH_WINDOWs and RUN_WINDOWs on queues and DISPATCH_WINDOWs on hosts that regulate when jobs can be scheduled (DISPATCH_WINDOW) or run (RUN_WINDOW).

View solution in original post

JuanS_OCS · Posted 05-06-2019 10:41 AM

Hello @ankitd ,

there are many points to consider, but there are some that seem the most interesting to me:

In the cloud you want to have co-location of your tiers, including the data and the clients (citrix), to keep latency and network traffic under control. Meaning: all of them together in the same data center, or really close from each other.
You will have at least one performance bottleneck for sure. It is your choice to decide where you want it to be: network, memory, cpu, storage, etc. The clouds have standard boxes and they cannot be customized much, although there are many options available.
You want the storage and network fo be fixed to your network design, not ephimeral
Grid is great for scaling, very easy, or scheduling. Up to SAS 9.4 M5, you had SAS Grid Manager for Hadoop (with YARN) and SAS Grid Manager for Platform (with LSF/JS). Starting SAS 9.4 M6, you have an additional flavour, SAS Grid Manager (which comes iwht a resource manager and scheduler provided by SAS itself, SAS Workload Orchestrator). I have experience with all of them: LSF is very mature, although it has no potential for further development. The SAS-provider version is really new, from October-2018, but with full of potential for further development, and the way to move forward since it will have better integration with SAS Platform/Viya.

Does it help? or would you like to stop in certain questions?

ankitd · Posted 05-06-2019 10:55 AM

@JuanS_OCS @doug_sas thanks for your insights, I understand that the setup and co-location along with network, storage etc. are key drivers.

However, what I am trying to understand is, once we have setup the environment and migrated artefacts to the cloud with necessary validation and testing completed successfully, who can I automate the scaling of Grid nodes if I wish to achieve that on daily basis.

Example: Grid environment has 8 nodes, of these I would like to have all the 8 available for the overnight batch processing however scale it down to 4 nodes post overnight batch run (i.e. during business hours).

Given that it would need to be done on daily basis hence would need to have an automation in place on Infra, Storage and Grid application.

Given the above scenario, I am looking on the insights requested earlier:

- Scheduling and automating scaling of SAS Grid nodes.

- Implications of doing so on daily basis.

- What are the key pointers that one should ensure on both Infra and SAS Grid application front.

- Any known issues or precautions you would have taken in your environment.

Regards,

Ankit.

MargaretC · Posted 05-06-2019 11:03 AM

What do you mean by "automating scaling of SAS Grid nodes"? If you mean spinning up a new AWS EC2 instance and using it as a SAS Grid node, then it is not there by default. The step you will have to do manually is adding the name of the new SAS Grid node to the file that defines all the SAS Grid compute nodes.
CoreCompete (a SAS partner) demoed doing this at the recent SAS Global Forum 2019 conference.
Cheers,
Margaret

ankitd · Posted 05-06-2019 11:19 AM

@MargaretC , when I mean automation, this post the environment setup.

Automating the closure of Grid nodes between a specific time so that they can then be moved in a STOPPED state via the Infra tool.

I am aware that achieving a scheduled stop-start of nodes on Infra level is something that needs to be taken care separately.

However any insights on what precautions need to be taken on both Infra and storage level would be helpful.

Coming back to main ask, is how to achieve closure of Grid nodes on the application level before stopping the Infra and how to start the application back once the nodes are available is what I am looking for.

Any real time implementation examples and insights would help..

Regards,

Ankit

MargaretC · Posted 05-06-2019 11:40 AM

So what you want to do is say from 23:00 until 5:00 SAS Grid nodes xx1 and xx2 are not available to the SAS Grid. So these nodes need to be taken out of a list that SAS Grid is aware of.
And then the nodes back to the list at 05:00.
If that is correct, then I can chase SAS Grid R&D to see if this can be done with the queue manager in SAS Grid.

ankitd · Posted 05-06-2019 11:44 AM

@MargaretC yes I m looking for those insights. If you can please help while covering the points I have mentioned during my initial post.

Thanks in advance.

doug_sas · Posted 05-06-2019 12:07 PM

There are a couple ways of doing this in SAS Workload Orchestrator both using time-based configuration. You can change the queue configuration based on time to change the list of hosts that can be used for a certain period of time. You can change the host type configuration based on time to set the maximum number of jobs on a host type to 0 during a certain period of time. Create one host type for hosts you want to keep open and one host type for hosts that you want to close. SAS Workload Orchestrator has no facility to kill jobs that are still running outside of a time window.

If you are using LSF, you could change the queue configuration to eliminate hosts for the period of time you want. LSF also has DISPATCH_WINDOWs and RUN_WINDOWs on queues and DISPATCH_WINDOWs on hosts that regulate when jobs can be scheduled (DISPATCH_WINDOW) or run (RUN_WINDOW).

doug_sas · Posted 05-06-2019 10:51 AM

In addition to what @JuanS_OCS said, LSF has in it 'resource connectors' that allow LSF to start up instances on various providers. SAS currently supports AWS instances, but the latest LSF has support for others.

You also need a data storage strategy. For example, will it be

SAS datasets?
RDBMS?
Hadoop?
S3-like containers?

Each has good & bad attributes (cost, speed, access). Also, if all instances went away, where would the data be stored for when the instances come back up?

Just some things to think about.

SAS Grid on Cloud

Re: SAS Grid on Cloud

Re: SAS Grid on Cloud

Re: SAS Grid on Cloud

Re: SAS Grid on Cloud

Re: SAS Grid on Cloud

Re: SAS Grid on Cloud

Re: SAS Grid on Cloud

Re: SAS Grid on Cloud

Re: SAS Grid on Cloud