Morning SAS Programmers, last night our new head of IT wants to drop SAS from our tool belt. We currently have SAS Viya 3.5 with the AI/ML and Visual Analytics. I come from 25 years of using various versions of SAS from 8 to 9.4 and now mostly use SAS studio. Love the SAS language and all the capabilities I've seen from SAS throughout the years, every bank I've worked for I'm an advocate for bringing in SAS, and this bank is no different. I brought SAS in and have one year left on our contract. My new boss shared an article: Introduction to Databricks and PySpark for SAS Developers | Databricks Blog I would love the communities feedback, especially if anyone has used Pyspark/Databricks and how they think it really compares to SAS's capabilities. How accurate is this article, any feedback on how to counter this article. Please share any articles you find. As always really appreciate this communities thoughts. Thank you.
A reply based only on reading that article as I have no experience with the products.
Near the top of the article is a statement: "The PySpark DataFrame API has most of those same capabilities." and shows a trivial data step and apparently analogous code for PySpark. I would determine which capabilities SAS has that this brief example does not match. Then see if any of those are critical to your current processes. Again, I have no experience with that product but if you have use a lot SAS Merge, Update or Modify data steps (just as an example) and those happen to be not one of those "same capabilities" or require lots of code to migrate then those points would be worth addressing.
The example addressing SQL shows an example done in SAS Proc SQL that would execute much faster as Proc Summary code. I would for other examples where use and comparison of SQL is proposed in lieu of other approaches that use other procedures that may have similar performance impact (assumes you actually use Summary instead of SQL for similar simple sums).
The Venn diagram of features seems to pretty much ignore all of the SAS products outside of Foundation and Stat (if I understand the claim) So perhaps bring up the modules you may use not addressed in that diagram.
There is a bit addressing "Column Oriented vs Business Logic" that compares some SAS IF/Then/Else statements with the PsSpark equivalent(? anytime I see code with multiple """ in a single statement I cringe). I suspect whoever prepared that may not be familiar with the SAS Select/When but that may be trivial.
There are sections about what SAS does "better" but I think misses a few points in the bit about Formats: Not common but when helpful the MULTILABEL format supported by summary/means, report and tabulate can move a lot of logic out of other places. Couple that with the PRELOADFMT to report on things that don't appear in a specific analysis sets intermittently can be quit powerful reporting tools. I didn't see anything about data-set driven creation of formats/informats either.
I also would be interested in thoughts and experiences of comparative evaluations between Databricks and SAS. Databricks seems to be getting a lot of buzz recently, and that will likely only grow with their recent AI acquisition https://techcrunch.com/2023/06/26/databricks-picks-up-mosaicml-an-openai-competitor-for-1-3b/ .
A reply based only on reading that article as I have no experience with the products.
Near the top of the article is a statement: "The PySpark DataFrame API has most of those same capabilities." and shows a trivial data step and apparently analogous code for PySpark. I would determine which capabilities SAS has that this brief example does not match. Then see if any of those are critical to your current processes. Again, I have no experience with that product but if you have use a lot SAS Merge, Update or Modify data steps (just as an example) and those happen to be not one of those "same capabilities" or require lots of code to migrate then those points would be worth addressing.
The example addressing SQL shows an example done in SAS Proc SQL that would execute much faster as Proc Summary code. I would for other examples where use and comparison of SQL is proposed in lieu of other approaches that use other procedures that may have similar performance impact (assumes you actually use Summary instead of SQL for similar simple sums).
The Venn diagram of features seems to pretty much ignore all of the SAS products outside of Foundation and Stat (if I understand the claim) So perhaps bring up the modules you may use not addressed in that diagram.
There is a bit addressing "Column Oriented vs Business Logic" that compares some SAS IF/Then/Else statements with the PsSpark equivalent(? anytime I see code with multiple """ in a single statement I cringe). I suspect whoever prepared that may not be familiar with the SAS Select/When but that may be trivial.
There are sections about what SAS does "better" but I think misses a few points in the bit about Formats: Not common but when helpful the MULTILABEL format supported by summary/means, report and tabulate can move a lot of logic out of other places. Couple that with the PRELOADFMT to report on things that don't appear in a specific analysis sets intermittently can be quit powerful reporting tools. I didn't see anything about data-set driven creation of formats/informats either.
Please also remember that most of these times when a company wants to stop using SAS, it is almost always driven by cost. Whether the article is accurate or not is pretty much irrelevant, in the minds of some managers, the low cost solution wins!
I'm with @PaigeMiller on this one. Any time any company or management talk about replacing SAS with some other product it is all about the software cost and not about the capabilities. That article you shared has a very narrow focus and just really covers a few technical features. I don't regard it as a basis for deciding to choose Databricks over SAS.
However if you are already a longtime SAS site, how many SAS applications do you have and what would be the cost of converting these to another product? This is where many managers become short-sighted. The software costs are peanuts compared with the cost of software conversion. In the case of the SAS applications I look after, the dollar cost of conversion would be into 8 digits, many times the software licensing costs. And that's quite apart from the challenge of resourcing it...
I don't know that companies are considering replacing SAS purely for cost reasons.
I am not an AI/ML person. But my understanding from some of the AI/ML people in my organization is that new models/algorithms are often available in python long before they would be available in SAS, and this is a reason they are happy with Databricks (PySpark). (And yes, I know that SAS and Viya are Python-friendly.) Ten years ago, there may have been some naive people thinking "lets replace PC SAS with python/R because they're free", but I don't think that's necessarily the case today. Databricks is not cheap.
On a more basic level, I knew a group that developed DI studio ETL jobs in SAS 9. They used DI studio jobs, not code. When they attempted to migrate to Viya (I can't remember which version), they discovered that many of the point-and-click DI Studio transformations they used were not available yet in Viya, and that they would be migrated as user-written code rather than transformation objects. This led them to evaluate other non-code ETL products rather than move on with Viya.
I'm a happy SAS user/advocate/etc. So I'm definitely not arguing for Databricks. But as an advocate for SAS, I don't think we should assume that when companies consider replacing SAS, they are driven only by mistaken cost assumptions.
I'm curious, @Reeza , since your company is implementing Databricks now. Do you think the company sees Datababricks as providing value along side Viya, or is it seen as an alternative to Viya?
@Quentin wrote:
I don't know that companies are considering replacing SAS purely for cost reasons.
It happened to me on a previous job. I left rather than start over learning some other programming language.
FYI SAS Viya has a "bill by user numbers" charging model too, different to SAS 9.4's by product and number of cores charging.
The trend these days seems to be companies only hiring experienced developers, and not training in new software. So if you didn't learn it at university or some other means then tough...not just a SAS issue really.
I've seen Databrick's marketing strategy firsthand - touting themselves as the "SAS replacement". Like any marketing strategy, use your own judgement.
@Reeza wrote:
Sadly haven't worked in a SAS shop for about 4 years now. Cost isn't the issue.
It's about 10x harder to find a trained SAS person than it is to find a R/Python programmer these days though and that's a bigger issue for us....
In a way, this is a cost issue - the indirect cost of finding the SAS programmers.
@mkeintz wrote:
@Reeza wrote:
Sadly haven't worked in a SAS shop for about 4 years now. Cost isn't the issue.
It's about 10x harder to find a trained SAS person than it is to find a R/Python programmer these days though and that's a bigger issue for us....
In a way, this is a cost issue - the indirect cost of finding the SAS programmers.
In my case, when the small company I worked for was purchased by a larger company, I was the only SAS programmer they needed and we had one SAS license. So finding a SAS programmer wasn't really an issue. It was simply that the $X thousand dollars we were paying to SAS each year could be replaced by paying zero dollars for R. The fact that I would have had to spend 6-12 months learning R and then writing code to replace the existing SAS code, while at the same time doing nothing else valuable, didn't seem to make a difference to the new company, they were cutting costs! Of course, every company and every person has a different story.
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 16. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.