Hi community
As we are approaching the retirement date for SAS9, the time has come to start planning a migration to Viya. We don't know much about Viya, but we see three bigger obstacles that I want to discuss with the community, and I hope this discussion will ease our minds, though I fear the worst.
SAS Information Delivery Portal
SAS9M8 is available now with support expected until 2028, so we thought that we had 5 years left to find out what to do. But it is a nasty surprise to see that the Information Delivery Portal is retired with this release. M7 is supported until 2025, so we have a couple of years to find a replacement for our current reports.
They are all built with use of javascript to provide selectboxes to control the content of displayed tables and graphs, and we use a framework supplied by SAS to generate the scripts. It works with special DI Studio transformations that generate the javascript and write it to the web server, but I am not sure if this framework is "official" SAS or something developed in our local SAS company. So I wonder if or how a similar functionality could be obtained in SAS Visual analytics.
Converting SAS programs to run under SAS Viya
There is a lot of information available on this topic, but it seems that converting even a simple job will be a time-consuming task, and we have many jobs doing things I never saw addressed, like interacting with Linus to use character conversion routines, retrieve folder content and other information from Linux, interact with LSF Process Manager, get content from FTP servers, run bash scripts, using macro loops, fcmp-functions etc. So how are these non-standard tasks implemented in Viya, and what to do if they are not?
We have about 6.100 jobs with at least 30.000 transformations built in SAS Data integration Studio and besides about 200 SAS programs executed by CRON/bash scripts, and we have no idea about the total time that should be allocated to the whole conversion.
Deployment and Scheduling
We run a SAS Data Warehouse with more more than 700 LSF Flows containing more than 6.000 scheduled jobs in the daily batch. At least 600 of the flows depend on previous flows, and a dependence chain can be up to 20 flows long. The scheduling works so any flow is held until all previous flows on the daily run list are finished, and then be either triggered to be run by LSF or omitted if a previous flow is failed. This is the basics, there is other funcionality included.
We deploy the jobs from SAS Data Integration Studio, where jobs are organized so one folder with jobs corresponds to one flow. Flows are created and jobs added in SAS Management Console, and they are scheduled to LSF as "Manually" without any triggers or depencencies defined. It is not necessary, because our home-built scheduler (SAS code) does the rest. It determines the flow dependencies by extracting information from SAS Metadata, and it monitors job and flow results and triggers or excludes flows depending on previous results. Everything is written to a control database, and a web application is used to follow the current batch status. We have used this scheduler for 12 years, and it works with minimal intervention, so it takes a few hours per month to keep the the whole batch up to date and running, including rerunning of flows after errors.
Now we try to figure out how we can get a similar functionality in SAS Viya. It seems that Viya is not built with a batch environment in mind, and from what I have read I gather that it is a long and complicated process to deploy a SAS Studio job as program code and create a script to execute it, and then what? It seems that the concept "Flow" and functionality comparable to LSF is missing in Viys, no scheduler to run the scripts with respect for dynamically derived dependencies, and no metadata to extract dependency information from. We are not alone in the world with a setup that depends on traditional batch processing, so somebody must have a setup to handle that in viya. The question is how time-consuming the tasks with defining scheduling details, running and rerunning jobs, maintaining of correct execution order, monitoring batch progress etc. is, i.e. how many extra hands will be needed in the daily work.
When it comes to development, we use about 3 minutes to deploy a new/changed job from DI Studio and add it to a flow. We suspect the process will be a much more complicated and time-consuming process in Viya, and it might require a lot of additional work to set the scheduling up too, depending on the solution to we find to handle that. We have at least 50 new/changed/obsolete jobs in at least 10 flows per week, often several times more, and we have no idea about the extra workload we can expect.
Today, We are 5 persons in a team to run a Linux cluster + metadata servers and web servers, get data from external sources, manage authorizations, manage client program, promote jobs to production, schedule jobs, monitor the daily batch, handle odd tasks and help users. We support 25 ETL developers and more than 100 EG Users. We cannot expect to cope with this without extra help. A team of extra persons is probably needed for the migration process, but we fear that we also will need several extra persons to handle the daily routine afterwards with a decent level of quality and quick response.
We need an estimate, so we can prepare our management, but more than anything we need a basic understanding of how we are going to do things in the future, down to tools and techniques. Our SAS consultants has a tendency to become uneasy and vague when these questions come up, so I look forward to any comments, thank you.
... View more