Recent times have seen the rise of innovative data and analytics companies such as Databricks, who provide a cloud-based data platform built on Apache Spark with distributed data processing capabilities for data engineering, large-scale data analytics and machine learning. SAS Viya and Databricks both cater to data engineers and data scientists. The two platforms have somewhat different focuses, features, and often serve different user personas and use cases. In my opinion, what is often missed, is how well these two platforms can complement each other when working side by side in unison.
SAS Viya naturally fits together with existing SAS solutions and allows seamless integration of modern in-memory analytics into existing SAS 9 environments. It’s less commonly known that modern day SAS also provides tight integration with open-source languages and tools, such as the ability to write Python code natively in SAS Studio (the SAS developer’s IDE) making it possible to embed Python code alongside with SAS code. Databricks was built on top of Apache Spark so it’s tightly coupled with the Spark ecosystem and naturally supports several programming languages like Scala and Python.
Both SAS Viya and Databricks are very good at scalability, as both can handle large datasets and provide parallel processing. Thinking practically, majority of SAS users today have SAS Enterprise Guide as the foremost tool in their data engineering and analytics toolbox. While intuitive to use for almost any data or analytics use case, SAS Enterprise Guide, is more of a Swiss-army knife than a full-fledged analytical hammer for big data processing. SAS being a developer focused software company is striving to provide each developer persona with the optimal tool for their specific working method. It’s reassuring to know that SAS Viya will support your preferred development method, be it code, low code or no code. And anyone coming from SAS background will appreciate the fact that Almost ALL of your SAS9 code runs on SAS Viya as there are in fact two analytical processing engines, SAS Compute for backwards compatibility with SAS9 and Cloud Analytics Service (CAS) for in-memory processing.
SAS Viya continues this tradition with a focus on developer-friendly analytics and model building interfaces while maintaining the power and openness of SAS for anyone developing code in SAS Studio or any of the graphical user interfaces. Giving the power in the coder’s hands follows the current trend of coding as preferred choice for many developers and as evidence of that, many like to work with Databricks using a notebook interface. Many opting for Databricks regard it well for flexibility and developer-friendliness for dedicated data scientists and analysts. To support the data scientist, Databricks supports MLlib, the machine learning library for Spark that can provide a variety of machine learning algorithms and can handle large-scale distributed machine learning tasks.
Having worked with data processing architectures for some time, I know that performance comparisons always must be taken with a grain of salt. Having said that, I see SAS Viya’s cloud-native, modular, containerized architecture as both cost-effective and highly performing. Fortunately, you don’t have to rely on my word, since an independent consultancy, Futurum Group, ran a series of analytical benchmarks for SAS Viya and direct competitors in 2023 and stated on performance of SAS Viya: “SAS Viya’s efficiency and performance result in its ability to dramatically outperform competing solutions, particularly at scale with large and complex datasets.” Have a look at the full study: Performance at Scale - Comparing AI/ML Performance of SAS Viya vs. Alternatives
Ultimately, the choice between SAS Viya and Databricks depends on your specific business requirements, existing infrastructure, and the preferred toolset for your analytics and machine learning tasks. I know from experience that many organizations that have used SAS for years have recently opted to have both platforms alongside, making the best of both worlds. I think SAS is very good at productionalizing analytics with a factory-like DataOps/ModelOps approach. Key part of the SAS methodology is the Analytics Life Cycle as depicted in the image below. It provides a framework for continuous development, governance, collaboration, and monitoring. This ensures that analytical models deliver reliable and accurate results throughout the model’s lifecycle. In many organizations I’ve seen data scientists on Databricks take a less structured approach of experimenting on analytics notebooks, writing code, visualizing results, and then sharing their data insights.
At SAS we see the end-to-end governed data and analytics platform as the biggest differentiator. It starts with you needing answers to your analytical questions, always involves accessing and managing data, and for the results often includes modeling and/or visualization. Finally, deployment of analytical models on your actual data provides you with the insights needed to make better decisions that will make a difference.
To learn more about how the SAS approach to governed analytics provides value for your business and ultimately for your customer, have a look at what S-Bank is saying about SAS Viya: “Working with SAS provides us a more structured way of working with analytics and has helped align our processes, methods and tools” https://www.sas.com/en_us/customers/s-bank.html
Great article - good to get a deeper understanding of how Databricks compares with SAS Viya - and that there are similarities and differences worth exploiting.
Thanks for the comment! I think it's ok as a normal topic, but I can check with the admins if it can be bumped into an article 🙂
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Need to connect to databases in SAS Viya? SAS’ David Ghan shows you two methods – via SAS/ACCESS LIBNAME and SAS Data Connector SASLIBS – in this video.
Find more tutorials on the SAS Users YouTube channel.