SAS Visual Forecasting (VF) can distribute open-source code (Python and R) to run in parallel in the cloud, in the same nodes that SAS Viya is installed. In that way you can easily scale up forecasting processes which are developed in open-source (OS) to millions of series and move away from ungoverned, inconsistent, and error-prone processes. To understand how the parallelization is achieved in more detail you can have a look at the paper here. This article is a technical end-to-end guide on how to set up SAS Visual Forecasting to integrate with OS, how you should go about developing and incorporating OS code inside SAS VF code and finally how you could achieve this integration even from the UI environment and compare OS and SAS models at a series-by-series level while also taking advantage the interactive exploration capabilities of SAS VF’s UI. The steps to achieve what we just described are as follows:
Deployment and Administration
Configure open-source languages
Install packages
Configure EXTLANG
Model Development
Write the open-source code file
Integrate the open-source code in PROC TSMODEL
Integration in Model Studio
Modify existing VF forecast code node in UI
Admins should perform the steps below:
Step 1 - Open-Source languages configuration
In SAS Viya 4, Python and R must be made accessible in the environment via persistent volumes by the Kubernetes administrator. Further details on the integration of Python/R environment can be found in the deployment files:
Step 2 - Install packages
Install the required packages needed for your project on the Python or R volumes created in the previous step. When installing the packages for the first time you should consider popular pre-configured environments that should cover most forecasting needs. In that way, data scientists could start developing directly on the environment provided to them. However, the idea is for data scientists to first experiment with new OS forecasting packages locally on a small scale and when they find an OS algorithm that looks promising, then they will have to pass the package needed to admins for further validation, prior of pushing it to the server. In that way, the best balance of governance and control is achieved between data scientists and IT.
Step 3 - Configure EXTLANG for external languages execution in CAS
The external languages (EXTLANG) package provides objects that enable the integration of external-language programs into SAS environments. The EXTLANG package supports Python (versions 2.6.6–2.7.7 and 3.3 and higher) and R (versions 3.2.5 and higher). Finally, the objects in this package enable you to specify which variables should be shared between the two environments. To configure EXTLANG:
<EXTLANG version="1.0" mode="ANARCHY" allowAllUsers="ALLOW">
<DEFAULT scratchDisk="/tmp" diskAllowlist="/">
<LANGUAGE name="R" interpreter="/R/R-4.0.2/lib64/R/bin/Rscript">
<ENVIRONMENT name="LD_LIBRARY_PATH" value="/R/R-4.0.2/lib64">
</ENVIRONMENT>
</LANGUAGE>
<LANGUAGE name="PYTHON3" interpreter="/python/miniconda3/envs/forecast_scb/bin/python"> </LANGUAGE>
</DEFAULT>
</EXTLANG>
More details about the options here External Languages Access Control Configuration
configMapGenerator:
- name: sas-open-source-config-r
literals:
#- DM_RHOME=/R/R-4.0.2/lib64/R/
- SAS_EXTLANG_SETTINGS=/R/R-4.0.2/sas/extlang_config-r-and-python.xml
#- SAS_EXT_LLP_R=/R/R-4.0.2/lib64/R/lib/
Note: You only need to set SAS_EXTLANG_SETTINGS for EXTLANG to work. The other variables affect different SAS products.
Data scientists should perform the steps below:
1. Write the open-source code file
We start the model development by writing the open-source code we want to use for the forecast. Here we use the Prophet algorithm in Python.
Important note: this code will be executed one time for each time series in our dataset. SAS Visual Forecasting will handle the distribution of the processing.
Data inputs:
Parameters:
Output:
from prophet import Prophet
import pandas as pd
# init DataFrame
df = pd.DataFrame({'ds': DS, 'y': Y})
# convert sas dates to python dates
df.ds = pd.to_timedelta(df.ds, unit='s') + pd.Timestamp('1960-1-1')
# Prophet Fit/Predict
m = Prophet()
m.fit(df.iloc[:(int(NFOR) - int(HORIZON))])
future = m.make_future_dataframe(periods=int(HORIZON))
forecast = m.predict(future)
# Output
PRED = np.array(forecast['yhat'])
2. Integrate the open-source code in PROC TSMODEL
The next step is to call this code file from within the SAS Visual Forecasting TSMODEL procedure, which will allow you to:
We can use this Python function in SAS code with PROC TSMODEL or via the Time Series Processing Action Set, which is callable from Python and R languages as well, using the included PYTHON2, PYTHON3, and R objects. Here we will use PROC TSMODEL since we need to use SAS code in order to integrate it as a node in Model Studio. For this example, we use the PYTHON3 object, which allows us to interact with the Python interpreter specified in the <LANGUAGE name="PYTHON3"> section of the XML file. The first step is to initialize the object.
declare object py(PYTHON3);
rc = py.Initialize();
There are 3 methods to specify the open-source code you want to run from PROC TSMODEL:
rc = py.PushCodeLine("w = np.ones(7)/7");
rc = py.PushCodeLine("nans = np.empty(6) ; nans[:] = np.nan");
rc = py.PushCodeLine("y_p = np.concatenate((nans,Y))");
rc = py.PushCodeLine("MAVG = np.convolve(y_p, w, mode='valid')");
rc = py.PushCodeFile('/shared/python_mavg_code.py');
rc = py.PushCodeFromTable(INEXTCODE_Object, Name);
The most convenient method is the second one and we will use it in our example. This method requires to configure the diskAllowlist setting in the EXTLANG configuration file to be able to access the file system. (See the Deployment and Configuration part)
We also need to do the mapping between the column names in the input dataset and the parameters used in the python code, as shown in the following code snippet:
*mapping variables, parameters and columns names;
rc = py.AddVariable(Revenue, 'ALIAS', 'Y') ;
rc = py.AddVariable(SAS_DATE, 'ALIAS', 'DS') ;
rc = py.AddVariable(PRED, "READONLY", "FALSE") ;
rc = py.AddVariable(_LENGTH_, 'ALIAS', 'NFOR') ;
rc = py.AddVariable(_LEAD_,'ALIAS','HORIZON') ;
*load the python file;
rc = py.PushCodeFile('/files/python_prophet_code.py') ;
We will also declare two additional objects OUTEXTLOG and OUTEXTVARSTATUS, for storing execution logs and variables statuses, respectively.
declare object pylog(OUTEXTLOG) ;
rc = pylog.Collect(py, 'EXECUTION') ;
declare object pyvars(OUTEXTVARSTATUS) ;
rc = pyvars.collect(py) ;
This will generate two output tables containing precious information for debugging. After code execution, we check whether the code was executed successfully or not. In the OUTEXTLOG object, we verify that all exit codes (_EXITCODE_) are equal to 0. If there are execution errors, the logs are available in the _LOGTEXT_ column.
The UPDATED variable in OUTEXTVARSTATUS object allows to verify that the variables were modified by the external-language program.
Data scientists can also integrate OS in Model Studio which is the UI environment for SAS Visual Forecasting. In that way, they take advantage of the automatic exploration capabilities of the UI and can compare and select the best algorithm from SAS and OS for their forecasting needs automatically at a series-by-series level. What is needed for this to be achieved is described below:
1. Modify an existing VF forecast node code
The first step is to create a forecasting pipeline and add a “Naïve Model” or an “Auto-Forecasting” node. We can then modify the code of this node via the “Open” Code Editor button. There are two options here. You can either develop a ‘pure’ open-source node where only the open-source code is run and then compare the overall results with SAS forecasting methodologies, or you can incorporate the OS code to compete directly with SAS algorithms at a series-by-series level.
For the second option, to make this article more digestible we will not discuss the code changes in detail but this blog discusses this process we need to follow to incorporate deep learning models into our VF pipelines. The process you would have to follow is the same instead of the deep learning part of the code you would incorporate your OS code and then pass it to subsequent steps using the EXMSPEC object (following the exact same way as it is described in the blog we mentioned before).
2. Customize the node [Optional]
We can also develop our own custom OS nodes and make them available around the business to be applied in different use-cases in a consistent manner. For more information on how to do that please have a look at this resource.
Attached to this article, an example of a packaged node.
In this step-by-step guide we saw how we can incorporate open-source time series algorithms in our forecasting processes using SAS VF. The benefits include:
The process we described may seem long for the first time but when you set it up once, it is straightforward how to apply it again and develop a framework of incorporating new algorithms and enhancing your forecasting process in a robust and consistent way. Happy forecasting!
- Common Pitfalls in Using the EXTLANG Package
- System-Defined Macros for a better understanding of the macros used in the code
- How to incorporate Recurrent Neural Networks in your SAS Visual Forecasting pipelines process of modifying default nodes
- Writing a Gradient Boosting Model Node for SAS® Visual Forecasting explains how to customize a node's UI
Join us for SAS Innovate 2025, our biggest and most exciting global event of the year, in Orlando, FL, from May 6-9. Sign up by March 14 for just $795.
Data Literacy is for all, even absolute beginners. Jump on board with this free e-learning and boost your career prospects.