BookmarkSubscribeRSS Feed

Using the SAS Configurator for Open Source to Build Python and R

Started ‎11-03-2022 by
Modified ‎09-08-2023 by
Views 9,526

One of the key features of SAS Viya is its integration with open-source languages such as Python and R. This open integration allows users to leverage their existing code and programming skills to speed their time to value with SAS Viya. The integration of SAS Viya with open-source software obviously depends on there being an installation of Python/R that the SAS Viya administrator configures SAS Viya to use.

 

To make installing and managing open-source installations easier for SAS Viya administrators, SAS provides the SAS Configurator for Open Source. The SAS Configurator for Open Source is a utility application that simplifies the download, configuration, building, and installation of Python and R from source. The results are a Python and R build that is located in a Persistent Volume Claim (PVC). The PVC and the builds that it contains can then be referenced by pods that require Python and R for their operations.

 

In this post, we will look at how an administrator uses the SAS Configurator for Open Source to build Python and R installs for use with SAS Viya.

 

Aside: Before I go on, I apologize up front for the length of this post. Even though it is long, I have not covered every aspect of using the SAS Configurator for Open Source.  The best place for additional information is in $deploy/sas-bases/examples/sas-pyconfig/README.md which you can find in your deployment assets.


How it works

 

Once configured, the SAS Configurator for Open Source creates and executes a sas-pyconfig job that

 

  • Downloads the source, signature file, and signer's key from the configured location. For R, only the source is downloaded.
  • Verifies the authenticity of the Python source using the signer's key and signature file. The R source cannot be verified at the time of this writing because signer keys are not generated for R source.
  • Extracts the Python and R sources into a temporary directory for building.
  • Configures and performs a make of the Python and R sources.
  • Installs the Python and R builds within the PVC and updates supporting components, such as PIP, if applicable.
  • Builds and installs configured packages for Python and R.
  • If everything completes successfully, creates the symbolic links, or changes the symbolic links' targets, to point to the latest builds for both Python and R.

 

The SAS Configurator for Open Source includes the ability to build and install multiple Python and R builds in the same PVC. In order to handle multiple builds, it uses profiles, which can used as references to different versions or builds of Python and R located in the PVC.

 

As you can imagine, downloading and building open-source software can be a resource intensive operation so after its initial execution, the sas-pyconfig job does not run again until a change is detected in the configuration settings for the job.

 

Building Python and R

 

Let's follow a scenario in which I have an existing SAS Viya deployment and I want to now install Python and R for use with my deployment.

 

Following the instructions in the $deploy/sas-bases/examples/sas-pyconfig/README.md, I need to

 

  1. Enable and configure the sas-pyconfig job for the installation of Python and R, including the mix of additional package I want installed
  2. Configure sufficient resource requests and limits for the sas-pyconfig job to complete
  3. Add the additional manifests to the base kustomization.yaml
  4. Rebuild my SAS Viya deployment.

 

The first step is to copy the example manifest files in $deploy/sas-bases/examples/sas-pyconfig to $deploy/site-config/sas-pyconfig.

 

export deploy=~/project/deploy/gelcorp
mkdir -p $deploy/site-config/sas-pyconfig 
cp $deploy/sas-bases/examples/sas-pyconfig/* "$_"
chmod 755 $deploy/site-config/sas-pyconfig/*.yaml

 

The $deploy/site-config/sas-pyconfig directory will now contain these files.

 

$ ll $deploy/site-config/sas-pyconfig/
-rwxr-xr-x 1 cloud-user cloud-user  1319 Oct 27 17:25 change-configuration.yaml
-rwxr-xr-x 1 cloud-user cloud-user  1161 Oct 27 17:25 change-limits.yaml
-r--r--r-- 1 cloud-user cloud-user 20326 Oct 27 17:25 README.md


1. Enable the sas-pyconfig job for install of Python and R

 

To enable the sas-pyconfig job itself and to enable the job to install Python and R, edit $deploy/site-config/sas-pyconfig/change-configuration.yaml and at the top of the file

 

  • Set global.enabled to "true" to enable the sas-pyconfig job and
  • Set global.python_enabled to "true" to install Python
  • Set global.r_enabled to "true"to install R.

 

You do not have to build both Python and R.  If you only need one of the languages simply set the one you do not need to "false".

 

patch: |-
- op: replace
  path: /data/global.enabled
  value: "true"
- op: replace
  path: /data/global.python_enabled
  value: "true"
- op: replace 
  path: /data/global.r_enabled
  value: "true"
- op: replace
  path: /data/global.pvc
  value: "/opt/sas/viya/home/sas-pyconfig"
...

 

The global.pvc value specifies the mount point within the SAS Configurator for Open Source job pod. This is the location of PVC in the job pod and is the installation location of Python and R profiles.  

 

Configure Python Install

 

Lower in the same change-configuration.yaml file are various Python configuration settings that can be used to create different Python profiles that can be configured differently for specific user needs.  The default profile is called "default_py" so you will notice that the options defining the default profile all include a reference to the "default_py" profile in the path value.  Here, the options install Python 3.8.13 for the default profile but I could easily create a second profile named "python2" and configure a second set of configuration settings to install Python 2 if that was needed for some reason.  The install_packages value allows me to specify the set of Python package I want installed for the default profile.  If I need to add or remove libraries later, I simply modify the default_py.pip_install_packages value and the sas-pyconfig job will update my installation.

 

There are additional options in change-configuration.yaml but these are ones affecting the Python installation.

 

- op: replace   #Space delimited list of Python profiles to create
    path: /data/global.python_profiles  
    value: "default_py"   
- op: replace   # Python build config options
    path: /data/default_py.configure_opts   
    value: "--enable-optimizations"
- op: replace   #Python build flags
    path: /data/default_py.cflags   
    value: "-fPIC"
- op: replace   # Packages that wheel will build from scratch rather than use binary builds
    path: /data/default_py.pip_install_nobinary  
    value: "Prophet sas_kernel"   
- op: replace   # Packages that will be installed by PIP.
    path: /data/default_py.pip_install_packages    
    value: "sas_kernel matplotlib sasoptpy sas-esppy NeuralProphet scipy rpy2 Flask XGBoost TensorFlow pybase64 scikit-learn statsmodels sympy mlxtend Skl2onnx nbeats-pytorch ESRNN onnxruntime opencv-python zipfile38 json2 pyenchant nltk spacy gensim"
- op: replace   # Used to verify the Python source download
    path: /data/default_py.python_signer   
    value: https://keybase.io/ambv/pgp_keys.asc
- op: replace   # Used to verify the Python source download
    path: /data/default_py.python_signature   
    value: https://www.python.org/ftp/python/3.8.13/Python-3.8.13.tgz.asc
- op: replace   # Python tarball to install
    path: /data/default_py.python_tarball   
    value: https://www.python.org/ftp/python/3.8.13/Python-3.8.13.tgz


Configure R Install

 

In the same $deploy/site-config/sas-pyconfig/change-configuration.yaml file you will find another set of options to configure the R install.  As with Python, R can be configured with multiple profiles, each of which would require a separate set of configuration options.

 

- op: replace   #Space delimited list of R profiles to create
  path: /data/global.r_profiles
  value: "default_r"
- op: replace   # R build config options
  path: /data/default_r.configure_opts
  value: "--enable-memory-profiling --enable-R-shlib --enable-BLAS-shlib --with-blas --with-lapack --with-readline=no --with-x=no"
- op: replace   #R build flags
  path: /data/default_r.cflags
  value: "-fPIC"
- op: replace   # R tarball to install
  path: /data/default_r.r_tarball   
  value: https://cran.r-project.org/src/base/R-4/R-4.2.0.tar.gz
- op: replace   # Packages that will be installed for R
  path: /data/default_r.packages
  value: "dplyr jsonlite httr tidyverse randomForest xgboost forecast"

2. Configure resources for the sas-pyconfig job

 

Now that I have the Python and R installs configured, I need to edit $deploy/site-config/sas-pyconfig/change-limits.yaml to provide the sas-pyconfig job with enough resources to carry out the builds.

 

The default resource request in the example manifest will configure requests 4 CPU cores and 3000Mi of memory without any upper limits.  Unfortunately, those requests are out of range for the size of my research system nodes.  For my deployment, I specified the resource requests and limits shown below which allowed the sas-pyconfig job to be scheduled by Kubernetes and the job complete successfully.

 

You may need to play with the requests and limits values to find the best fit for your deployment.  The sas-pyconfig job will fail if you do not set CPU limits to at least 4 CPU cores (4000m).  Reducing the resources for the sas-pyconfig job will, of course, affect the time it takes to complete.  Because the sas-pyconfig job is typically an infrequent expense, you have some flexibility here and you may be able to afford longer running job time if your deployment is short on resources.

 

The duration of the sas-pyconfig job is heavily dependent on the resources you provide to the job pod, whether you build both languages or only one, and the number of additional packages you install.

 

---
apiVersion: builtin
kind: PatchTransformer
metadata:
  name: sas-pyconfig-limits
patch: |-
  - op: replace
    path: /spec/jobTemplate/spec/template/spec/containers/0/resources/requests/cpu
    value:
      500m
  - op: replace
    path: /spec/jobTemplate/spec/template/spec/containers/0/resources/requests/memory
    value:
      1000Mi
  - op: replace
    path: /spec/jobTemplate/spec/template/spec/containers/0/resources/limits/cpu
    value:
      4000m
  - op: replace
    path: /spec/jobTemplate/spec/template/spec/containers/0/resources/limits/memory
    value:
      3000Mi
target:
group: batch
kind: CronJob
name: sas-pyconfig
version: v1


3. Add the new manifests to your deployment

 

The final step to enable the sas-pyconfig job is to add both change-configuration.yaml and change-limits.yaml to the transformers field of the base kustomization.yaml.

 

transformers:
  ...
  - site-config/sas-pyconfig/change-configuration.yaml
  - site-config/sas-pyconfig/change-limits.yaml
  ...


4. Rebuild and apply the changes to your SAS Viya deployment

 

Because you have made changes to kustomization.yaml, you will need to rebuild your SAS Viya deployment and apply the changes to your cluster.  You will need to follow the process necessary for your particular situation depending on the deployment method you employ.

 

See Modify Existing Customizations in a Deployment for guidance on this task.

 

If you deployed using the viya4-deployment GitHub project, you should consult the project documentation for guidance.

 

If your deployment uses the Deployment Operator, the sas-pyconfig job will execute automatically with your configured changes.

 

If your deployment is manually managed, you will need to execute the job yourself by running a command similar to this after applying the updated configuration to your cluster.

 

kubectl create job sas-pyconfig-adhoc -n  --from cronjob/sas-pyconfig

 

The results...

 

When you apply the updates to your SAS Viya deployment the sas-pyconfig job will execute.  It usually takes several minutes for the job to start but using OpenLens or kubectl, you can monitor the sas-pyconfig job pod.  There will probably be an existing sas-pyconfig job pod so you will likely see the old job pod terminate.  This will be immediately followed by the new job pod starting, running, and eventually succeeding.

 

$ kubectl get pod --selector app.kubernetes.io/name=sas-pyconfig --watch
NAME                           READY   STATUS             RESTARTS   AGE
sas-pyconfig-cjinitial-45r8h   0/1     Completed          0          108m
sas-pyconfig-cjinitial-45r8h   0/1     Terminating        0          111m
sas-pyconfig-cjinitial-l8xfz   0/1     Pending            0          0s
sas-pyconfig-cjinitial-l8xfz   0/1     ContainerCreating  0          0s
sas-pyconfig-cjinitial-l8xfz   1/1     Running            0          2s
sas-pyconfig-cjinitial-l8xfz   0/1     Completed          0          70m

 

You probably noticed that the job pod names include 'cjinitial'.  The SAS Configurator for Open Source actually creates an unscheduled sas-pyconfig cronjob that is used as a template for actual executions of the work.

 

I captured the CPU and Memory utilization from this run of the job and you can see the high-water marks for CPU and Memory that justify the values from step #3 above.  The first hump is for the build of Python, the second is for R.  Neither resource is maxed out for the duration of the job but you must specify limits high enough to account for the peak usage.

 

sm_1_pyconfig_jobprofile.png

Select any image to see a larger version.
Mobile users: To view the images, select the "Full" version at the bottom of the page.

 

If we now look at the contents of the sas-pyconfig PVC we will see the following items have been created by the sas-config job.

 

$ ls -al 
drwxrwxrwx  4 root      root       113 Oct 27 21:48 .
drwxr-xr-x 31 nfsnobody nfsnobody 4096 Oct 27 20:41 ..
lrwxrwxrwx  1 sas       sas         56 Oct 27 21:02 default_py -> /opt/sas/viya/home/sas-pyconfig/Python-3.8.13.1666917490
lrwxrwxrwx  1 sas       sas         50 Oct 27 21:48 default_r -> /opt/sas/viya/home/sas-pyconfig/R-4.2.0.1666917490
-rwxr-xr-x  1 sas       sas         32 Oct 27 21:48 md5sum
drwxr-xr-x  8 sas       sas         83 Oct 27 20:50 Python-3.8.13.1666917490
drwxr-xr-x  5 sas       sas         43 Oct 27 21:11 R-4.2.0.1666917490

  • default_py is a symbolic link to the latest build for the default_py profile configuration.  This link is updated each time the Python build is updated so that it can used in subsequent SAS Viya configuration as a dynamic reference to the current Python configuration.  As of this writing, the symlink only works if you use a mountPath of /opt/sas/viya/home/sas-pyconfig when attaching the sas-pyconfig PVC to SAS Viya pods.
  • default_r is a symbolic link to the latest build for the default_r profile configuration.  It provides a dynamic reference to the latest R build so that subsequent updates will not require a configuration change of any code.  As with default_py, the symlink only works if you use a mountPath of /opt/sas/viya/home/sas-pyconfig when attaching the sas-pyconfig PVC to SAS Viya pods.
  • md5sum is a md5 hash of the change-configuration.yaml file and is used to detect subsequent changes to the Python and R build configurations.  The sas-pyconfig job uses this file to know whether it needs to do anything or not when it executes.
  • Python-3.8.13.1666917490 is the directory containing Python and additional packages as configured.  The sas-pyconfig job will create an additional directory like this one to house any updates and will update the default_py symlink if the change is for the default profile.  If you define multiple profiles you will have a 'current' Python* directory for each profile.
  • R-4.2.0.1666917490 is the directory containing R and additional packages as configured. Similarly, sas-pyconfig will create an additional directory like this one to house any updates and will update the default_r symlink if the change is for the default profile.  If you define multiple profiles you will have a 'current' R* directory for each profile.

The number 1666917490 at the end of the Python and R directories is a datetime value in Unix epoch format for when the sas-pyconfig job ran.  This ensures that future updates have a unique directory name so the symlink can be updated properly.  

 

What if I want to...

 

...change the version of Python or R or add additional packages to either language?

 

You simply need to repeat steps #2 (edit change-configuration.yaml) and #4 (rebuild and apply).  Using the md5sum file, the sas-pyconfig job will detect that you have modified something in your Python or R configurations and it will carry out the new builds.

 

...only install Python, or only R?

 

In step #1 you can set global.r_enabled = "false" to prevent R from building or set global.python_enabled = "false" to prevent Python from building.

 

A second option is to edit the sas-pyconfig-parameters configmap to set the values to "false".  Keep in mind that this approach is temporary and will be overwritten with the values from change-configuration.yaml the next time you rebuild your deployment and apply it to the cluster.

 

...prevent the sas-pyconfig job from accidentally updating my Python or R builds?

 

You can repeat step #1 and set global.enabled = "false" then rebuild and apply.  You can also edit the sas-pyconfig-parameters configmap to affect the same change but your edit will be changed back when you next rebuild your deployment and apply it to the cluster.  

 

Summary

 

The SAS Configurator for Open Source utility provides SAS Viya administrators with an easy way to manage builds of Python and R for integration with SAS Viya.  Admittedly, this is only part of the overall story for configuring Python and R with SAS Viya.  Subsequent posts will describe the process for making Python and R available to SAS Viya users.  

 

Find more articles from SAS Global Enablement and Learning here.

Comments

Hey Scott,

Really. helpful article. Recently, I installed the Python packages using this method and I can access Python modules.

 

FYI..

However, in my case I don't see below "default_py" soft-link.

 

"default_py -> /opt/sas/viya/home/sas-pyconfig/Python-3.8.13.1666917490"

 

So every time I run the "sas-pyconfig-adhoc" job I have to update PATH (/opt/sas/viya/home/sas-pyconfig/Python-3.8.13.1666917490/bin/python3) across all other files as this profile PATH (/opt/sas/viya/home/sas-pyconfig/default_py/bin/python3) is not available.

 

Also, I have noticed that it creates "saspyconfigvol" volume instead of  "python-volume". 

 

I'm using 2023.02 so not sure if something has changed in recent version. 

I just checked a 2023.02 deployment and do not see that there have been any changes to the way this should work. If you look in your change-configuration.yaml file, you should see these 8 references to "default_py" and the configuration that should define the symlink for you.
- op: replace
path: /data/global.python_profiles
value: "default_py"
- op: replace
path: /data/default_py.configure_opts
value: "--enable-optimizations"
- op: replace
path: /data/default_py.cflags
value: "-fPIC"
- op: replace
path: /data/default_py.pip_install_nobinary
value: "Prophet sas_kernel"
- op: replace
path: /data/default_py.pip_install_packages
value: "Prophet sas_kernel matplotlib sasoptpy sas-esppy NeuralProphet scipy rpy2 Flask XGBoost TensorFlow pybase64 scikit-learn statsmodels sympy mlxtend Skl2onnx nbeats-pytorch ESRNN onnxruntime opencv-python zipfile38 json2 pyenchant nltk spacy gensim"
- op: replace
path: /data/default_py.python_signer
value: https://keybase.io/ambv/pgp_keys.asc
- op: replace
path: /data/default_py.python_signature
value: https://www.python.org/ftp/python/3.8.13/Python-3.8.13.tgz.asc
- op: replace
path: /data/default_py.python_tarball
value: https://www.python.org/ftp/python/3.8.13/Python-3.8.13.tgz
If you do see these references in your file then you may want to examine the log from the sas-pyconfig pod to see if there is any sign of a problem. When the symlink is created you should see this in the log:
"messageKey":"Python symlink creation success. Symlink: /opt/sas/viya/home/sas-pyconfig/default_py Target: /opt/sas/viya/home/sas-pyconfig/Python-3.8.13.1678893181"
If the symlink cannot be created there should be a hint of what the problem may be instead of the success message.
If these things check out then I believe it might be best to open a Technical Support ticket so they can help you find what is causing an issue for you.

Hey Scott,

I'm facing the same exact issue Mayankp has, I'm on stable 2023.03

I noticed 3 strange things:

  1. sas-pyconfig-adhoc job doesn't complete, its pod is running even if it is not doing anything
  2. in the pod I don't see any info about the symlink
  3. this part  "/data/default_py.pip_install_nobinary" is not present on my change-configuration.yaml

I will check with the TCS

 

Maurizio

 

@mauriziopinzi  For me these Python packages (Prophet and ESRNN packages) failed to install. So, after removing them below listed python packages installed successfully. However, Python does not work from SAS Studio due to it failed to create python subprocess. 

So, I'm still trying to figure it out. 

 

 - op: replace
path: /data/default_py.pip_install_nobinary
value: "Prophet sas_kernel"
- op: replace
path: /data/default_py.pip_install_packages
value: "pystan matplotlib sasoptpy sas-esppy NeuralProphet scipy rpy2 Flask XGBoost TensorFlow pybase64 scikit-learn statsmodels sympy mlxtend Skl2onnx nbeats-pytorch onnxruntime opencv-python zipfile38 json2 pyenchant nltk spacy gensim pandas
pandasql pysqlite3 numpy saspy torch pyreadstat pyarrow pyspark plotly scipy ramp-workflow"

 

@Mayankp what are you using as storage class RWX? if it is Azure File I think you problem could be related to troughput, there should be timeout parameter for python

It took 5 hours to configure python and I ended up with this error running python code

tcpSelectSelect returned an error in the tkpy extension in connect

 which I solved increasing the timeout, something like that

proc python  TIMEOUT=300;
submit;
var1 = "'python'"
var2 = 2
SAS.submit("data work.test; x={}; y={}; run;".format(var1,var2))

var3 = SAS.sasfnc("sha256hex","abc")
print("var3 = " + var3)

endsubmit;
run;

@mauriziopinzi  I do use the Azure Files with Storage Class RXW. After adding the TIMEOUT=300 the Python has started working. Is there any Python TIMEOUT value configuration in Viya which can be applied at system level rather than at session level? Thanks. 

@Mayankp I'm sorry I don't know if it is possible

hi

in our case, we added the 

proc python  TIMEOUT=3000; run;

in the autoexec context

then, it works For SAS proc Python and also for the .py program (because you need it here also)

 

Version history
Last update:
‎09-08-2023 11:00 AM
Updated by:
Contributors

sas-innovate-2024.png

Available on demand!

Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.

 

Register now!

Free course: Data Literacy Essentials

Data Literacy is for all, even absolute beginners. Jump on board with this free e-learning  and boost your career prospects.

Get Started