SAS has released SAS Airflow Provider earlier this year. This tool allows SAS administrators/power users to orchestrate SAS jobs and SAS Studio flows using Apache Airflow, an “open-source platform for developing, scheduling, and monitoring batch-oriented workflows”.
Select any image to see a larger version.
Mobile users: To view the images, select the "Full" version at the bottom of the page.
Apache Airflow can be deployed in many ways and might already exist at your site. If not, then it might be the opportunity to think it twice and make SAS Viya and Apache Airflow work hand in hand. In this blog, we will look at some aspects of Apache Airflow deployment that can help you leverage a seamless integration between SAS Viya and Apache Airflow.
While it is not mandatory for Airflow to work with SAS Viya, Airflow can be deployed in Kubernetes which makes it flexible, cloud-native and elastic. An Official Helm Chart for Apache Airflow is available to deploy it in Kubernetes very easily.
Airflow can be deployed in the same Kubernetes cluster as SAS Viya, leveraging the same infrastructure, in a different namespace for software isolation.
If dedicated to SAS Viya, Airflow workload should only consist into calls of SAS jobs and SAS Studio flows which in turn run in the SAS Viya platform.
Finally, to be able to use the SAS operators in Airflow, the default container image used in the Airflow Helm chart needs to be extended to include SAS Airflow Provider.
First, let’s remind that what we would probably call a process flow (a collection of tasks/programs/jobs organized in a specific sequence) is called a DAG (Directed Acyclic Graph) in Airflow and is defined in a Python script. So just code. No authoring UI is available out of the box.
Also, Airflow DAGs (Python scripts) are automatically discovered by the Airflow framework when they are saved in a determined DAGs directory. That said, if Airflow and SAS Viya share the same DAGs directory, it is possible to define an Airflow DAG from within SAS Viya that is automatically discovered in Airflow facilitating the integration.
First step is to extend the default Airflow container image to include the pieces that will allow us to call SAS Viya jobs or flows. Indeed, Airflow comes out of the box with several providers (also called operators) which makes integration with third-party tools possible (essentially Airflow triggers jobs from various applications). Numerous providers are not included by default and need to be installed.
Below is an example of how to extend the default container image. Here, we use a Dockerfile with a requirements.txt file to build a new image that will include the SAS Airflow provider.
tee ./requirements.txt > /dev/null << EOF
sas-airflow-provider
EOF
tee ./Dockerfile > /dev/null << EOF
FROM apache/airflow:latest
RUN pip install --upgrade pip
COPY requirements.txt .
RUN pip install -r requirements.txt
EOF
docker build -t airflow-sas:1.0.0 .
Then you have to make this image available for your deployment (multiple options described in point 4 here). It can be done by pushing it to a remote registry:
docker tag airflow-sas:1.0.0 registry.example.com/project/airflow/airflow-sas:1.0.0
docker push registry.example.com/project/airflow/airflow-sas:1.0.0
We are ready to start the process of deploying. First thing is to customize the Airflow deployment. There are several properties that we want to modify.
You can have a look at all the chart’s properties here or here or you can write them down in a file to help you get started:
helm repo add apache-airflow https://airflow.apache.org
helm show values apache-airflow/airflow > values.yaml
We are going to focus on a few properties of interest, but I will provide the values.yaml file I use at the end:
defaultAirflowRepository: registry.example.com/project/airflow/airflow-sas
defaultAirflowTag: "1.0.0"
This is the repository address in your registry where you saved your customized container image. You also need to specify the saved tag name.
ingress.web.hosts: [airflow.example.com]
ingress.web.ingressClassName: "nginx"
You define the Ingress host (essentially the Airflow UI URL) and you can reuse the Ingress class name of your SAS Viya deployment.
extraEnv: |
- name: AIRFLOW__CORE__LOAD_EXAMPLES
value: 'True'
- name: AIRFLOW__SCHEDULER__DAG_DIR_LIST_INTERVAL
value: '30'
You can add some environment variables. Here I want to get some sample DAGs already loaded and I want to reduce the scanning time of the DAGs directory – how often (in seconds) to scan the DAGs directory for new files – from 5 minutes to 30 seconds (demo settings to quickly discover a DAG in Airflow when a file is dropped in the DAGs directory).
volumes:
- name: dags
nfs:
server: nfs.example.com
path: /shared/gelcontent/airflow/dags
volumeMounts:
- mountPath: '/opt/airflow/dags'
name: 'dags'
Finally, I want to customize the default DAGs directory (the main directory that Airflow scans to detect new DAGs) to be a location on a NFS server shared between Airflow and SAS Viya. Indeed, that will be handy when I want to author a DAG from within SAS. I will save it to the DAGs directory, and it will automagically appear in Airflow.
Here is the complete values.yaml file I use:
# Default airflow repository -- overridden by all the specific images below
defaultAirflowRepository: registry.example.com/project/airflow/airflow-sas
# Default airflow tag to deploy
defaultAirflowTag: "1.0.0"
# Ingress configuration
ingress:
# Enable all ingress resources (deprecated - use ingress.web.enabled and ingress.flower.enabled)
enabled: ~
# Configs for the Ingress of the web Service
web:
# Enable web ingress resource
enabled: true
# Annotations for the web Ingress
annotations: {}
# The path for the web Ingress
path: "/"
# The pathType for the above path (used only with Kubernetes v1.19 and above)
pathType: "ImplementationSpecific"
# The hostname for the web Ingress (Deprecated - renamed to ingress.web.hosts)
host: ""
# The hostnames or hosts configuration for the web Ingress
hosts: [airflow.example.com]
# - name: ""
# # configs for web Ingress TLS
# tls:
# # Enable TLS termination for the web Ingress
# enabled: false
# # the name of a pre-created Secret containing a TLS private key and certificate
# secretName: ""
# The Ingress Class for the web Ingress (used only with Kubernetes v1.19 and above)
ingressClassName: "nginx"
# configs for web Ingress TLS (Deprecated - renamed to ingress.web.hosts[*].tls)
tls:
# Enable TLS termination for the web Ingress
enabled: false
# the name of a pre-created Secret containing a TLS private key and certificate
secretName: ""
# HTTP paths to add to the web Ingress before the default path
precedingPaths: []
# Http paths to add to the web Ingress after the default path
succeedingPaths: []
# Configs for the Ingress of the flower Service
flower:
# Enable web ingress resource
enabled: false
# Annotations for the flower Ingress
annotations: {}
# The path for the flower Ingress
path: "/"
# The pathType for the above path (used only with Kubernetes v1.19 and above)
pathType: "ImplementationSpecific"
# The hostname for the flower Ingress (Deprecated - renamed to ingress.flower.hosts)
host: ""
# The hostnames or hosts configuration for the flower Ingress
hosts: []
# - name: ""
# tls:
# # Enable TLS termination for the flower Ingress
# enabled: false
# # the name of a pre-created Secret containing a TLS private key and certificate
# secretName: ""
# The Ingress Class for the flower Ingress (used only with Kubernetes v1.19 and above)
ingressClassName: ""
# configs for flower Ingress TLS (Deprecated - renamed to ingress.flower.hosts[*].tls)
tls:
# Enable TLS termination for the flower Ingress
enabled: false
# the name of a pre-created Secret containing a TLS private key and certificate
secretName: ""
extraEnv: |
- name: AIRFLOW__CORE__LOAD_EXAMPLES
value: 'True'
- name: AIRFLOW__SCHEDULER__DAG_DIR_LIST_INTERVAL
value: '30'
webserverSecretKey: d9e40acbe90806dd6fc30d67edd3bdd0
volumes:
- name: dags
nfs:
server: nfs.example.com
path: /shared/gelcontent/airflow/dags
- name: plugins
nfs:
server: nfs.example.com
path: /shared/gelcontent/airflow/plugins
- name: scripts
nfs:
server: nfs.example.com
path: /shared/gelcontent/airflow/scripts
volumeMounts:
- mountPath: '/opt/airflow/dags'
name: 'dags'
- mountPath: '/opt/airflow/plugins'
name: 'plugins'
- mountPath: '/opt/airflow/scripts'
name: 'scripts'
logs:
persistence:
# Enable persistent volume for storing logs
enabled: true
Now is time to deploy Airflow:
helm repo add apache-airflow https://airflow.apache.org
helm upgrade --install airflow apache-airflow/airflow --namespace airflow --create-namespace -f ./values.yaml
Validate it is working by opening the URL which is the Ingress you specified earlier ("http://airflow.example.com/") and connecting using the default user admin/admin:
First thing to check is the presence of the SAS Airflow provider:
Now that we have Apache Airflow setup with the SAS Airflow provider and both Airflow and SAS Viya sharing the same DAGs directory, we can illustrate how SAS Viya and Airflow can work together. Let’s do this in video:
Some links of interest:
Find more articles from SAS Global Enablement and Learning here.
thank you brining this tool to attention. Most probably we are going to use it because of the coexistence with SAS. Without the knowledge we were thinking about Dagster. Demo is great. One clarification question, these customs steps which you are showing like :"airflow-add task", where can this be found? regards Karolina
@touwen_k Thanks for your comments Karolina. Let me check what I can do to share those custom steps.
@NicolasRobert Hi Nicolas, thank you for a great article. Did you have a chance to share those custom steps that Karolina was asking about? That would be really helpful if you could provide some kind of repository with them.
Hello.
There is a public repository for crowd-sourced custom steps (https://github.com/sassoftware/sas-studio-custom-steps) that I recommend you to visit. Unfortunately, I haven't had time to publish the Airflow ones in it yet.
Feel free to leave me your email address through a private message and I will send them to you.
Regards,
Nicolas.
Hi Nicolas. Did you ever share the custom steps to create the DAGs? Very informative post by the way. Thanks Eoin.
The SAS Studio Custom Steps used in this demo are now available in the SAS Studio Custom Step GitHub repository:
https://github.com/sassoftware/sas-studio-custom-steps/tree/main/Airflow%20-%20Generate%20DAG
Does anyone have updated instructions to work on the latest SAS Viya 2024.02 ?
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Data Literacy is for all, even absolute beginners. Jump on board with this free e-learning and boost your career prospects.