Hi
After trying to turn off the infrastructure through a runbook on Microsoft Azure (ran without errors or warnings) I have discovered, that the infrastructure is still on. And the runbook used to either stop or start fails with both commands. I get 503'd when I try to open SAS Viya, so I am currently trying to fix this via the Azure portal. Upon further investigation, I have found the following statement in the Output section of the runbook:
Unable to perform 'stopping' operation since the cluster is in 'failed' provision state. Please use 'az aks update' to reconcile the cluster to succeeded state. Please check https://learn.microsoft.com/en-us/cli/azure/aks?view=azure-cli-latest#az-aks-update for more information.
The linked website has a long list of what I presume is a list of commands used with the az asks update.
Is there anywhere, I can find a step-by-step guide on how to do this az aks update?
Best regards,
Jonas
Hi,
Is this on your own deployment of Viya where you control the infrastructure, or on a SAS hosted environment, or on Viya on Azure Marketplace?
Assuming the infrastructure is under your control, then you need to install the Azure CLI per the instructions at How to install the Azure CLI | Microsoft Learn. Then:
1. az login - see Sign in with Azure CLI — Login and Authentication | Microsoft Learn. I typically use az login --use-device-code and then follow the provided instructions to authenticate via a web browser.
2. For each failed nodepool issue: az aks nodepool update --resource-group <your_resource_group> --cluster-name <aks_cluster_name> --name <nodepool_name>
3. Then issue: az aks update --resource-group <your_resource_group> --name <aks_cluster_name>
4. Go back to the Azure portal and refresh. The state should update and then your normal portal actions (including the runbook) should start working again.
Each of the above commands can take quite a few minutes to complete but the above steps have always cleared the failed state for me.
I'm also assuming you have the necessary access on the resource group and AKS cluster. If not, then whomever does have access needs to do the above.
Hope that helps!
Marc W. Price
Hi Marc,
Thank you for taking your time to answer our question. So far we have not been able to fix this issue and right now we keep paying for this, because the infrastructure is still 'on' and we cannot turn it off. I have spoken with my IT-manager (we have also been in contact with Microsoft Support), and a local SAS-consultant. The SAS-consultant has advised us to keep this thread going, and make a few clarifications about the problem:
We are using SAS Viya on the Azure Marketplace platform.
We start and stop the infrastructure by using a out-of-the-box StartStopViya runbook
This has worked until this January.
When running the StartStopViya runbook on January 23rd with the 'stop' command it finished running without warnings and errors, but.... upon review of the "Output" from the runbook it showed this error:Ruunbook23error
Since then we have not been able to start or stop the infrastructure using the runbook.
Upon further investigation of our Microsoft Azure Subscription, we have found, that the Node pools associated with our Resource Group all are in a 'failed' provision state (wich is also pointed out in the Output-screenshot above). When looking at the Node pools on Microsoft Azure, it looks like this:
failed node pools
We have tried to manually stop the Node pools, by choosing them and pressing the stop icon. Then we get the following error-message:
Node pool error
We have also tried to manually upgrade the Kubernetes on the different Node pools, but get the following error when trying:
I and my IT-manager are both "owners" of our Azure subscription and should have the needed permissions to change things - but somehow we do not.
As this is currently costing us money on a daily basis, we would be extremely grateful for any help you can provide. We would grately appriciate being able to stop (and start again in the future) the infrastructure again via the runbook.
Best regards,
Jonas
Hi,
Thanks for the additional details. In my initial response, I provided the CLI commands that typically clear this state. Did you execute the commands? If so, what was the result?
In your latest update, it sounds like you have tried things through the UI and the runbook but I see no details about what happened after using the CLI commands. The UI and runbook will not work until you reconcile the errors using the CLI commands. That is exactly what the Azure error message is telling you do to.
Please let us know the results after using the CLI commands.
Thanks!
Marc
Hi,
Thanks for the reply. Unfortunately the advised solution does not get us any closer to control the infrastructure. I get the samne error-message as I do when I try to do it directly through Azure:
Error in CLI
This is still a daily cost for us, and any advise on how to continue would be greatly appriciated.
Best regards, Jonas
The node pools are in a Failed state due to a failed management operation (Admin.PutManagedCluster). This occurred because the cluster is running Kubernetes v1.24.10, which is no longer supported by Microsoft in the North Europe region.
At this time, we do not support in‑place Kubernetes upgrades for existing deployments. Kubernetes version support and SAS Viya compatibility are maintained at the Marketplace offer level, and changes there apply to new deployments only—they do not retroactively update running environments.
The quickest and most reliable resolution is to delete the existing deployment, and redeploy the offer from Azure Marketplace, which will provision the cluster using a supported Kubernetes version for the region.
Dive into keynotes, announcements and breakthroughs on demand.
Explore Now →