BookmarkSubscribeRSS Feed
☑ This topic is solved. Need further help from the community? Please sign in and ask a new question.
jimbobob
Quartz | Level 8

Morning SAS Programmers, last night our new head of IT wants to drop SAS from our tool belt. We currently have SAS Viya 3.5 with the AI/ML and Visual Analytics. I come from 25 years of using various versions of SAS from 8 to 9.4 and now mostly use SAS studio. Love the SAS language and all the capabilities I've seen from SAS throughout the years, every bank I've worked for I'm an advocate for bringing in SAS, and this bank is no different. I brought SAS in and have one year left on our contract. My new boss shared an article:  Introduction to Databricks and PySpark for SAS Developers | Databricks Blog    I would love the communities feedback, especially if anyone has used Pyspark/Databricks and how they think it really compares to SAS's capabilities. How accurate is this article, any feedback on how to counter this article. Please share any articles you find. As always really appreciate this communities thoughts. Thank you.

1 ACCEPTED SOLUTION

Accepted Solutions
ballardw
Super User

A reply based only on reading that article as I have no experience with the products.

 

Near the top of the article is a statement: "The PySpark DataFrame API has most of those same capabilities." and shows a trivial data step and apparently analogous code for PySpark. I would determine which capabilities SAS has that this brief example does not match. Then see if any of those are critical to your current processes. Again, I have no experience with that product but if you have use a lot SAS Merge, Update or Modify data steps (just as an example) and those happen to be not one of those "same capabilities" or require lots of code to migrate then those points would be worth addressing.

 

The example addressing SQL shows an example done in SAS Proc SQL that would execute much faster as Proc Summary code. I would for other examples where use and comparison of SQL is proposed in lieu of other approaches that use other procedures that may have similar performance impact (assumes you actually use Summary  instead of SQL for similar simple sums).

 

The Venn diagram of features seems to pretty much ignore all of the SAS products outside of Foundation and Stat (if I understand the claim) So perhaps bring up the modules you may use not addressed in that diagram.

 

There is a bit addressing "Column Oriented vs Business Logic" that compares some SAS IF/Then/Else statements with the PsSpark equivalent(? anytime I see code with multiple """ in a single statement I cringe). I suspect whoever prepared that may not be familiar with the SAS Select/When but that may be trivial.

 

There are sections about what SAS does "better" but  I think misses a few points in the bit about Formats: Not common but when helpful the MULTILABEL format supported by summary/means, report and tabulate can move a lot of logic out of other places. Couple that with the PRELOADFMT to report on things that don't appear in a specific analysis sets intermittently can be quit powerful reporting tools. I didn't see anything about data-set driven creation of formats/informats either.

View solution in original post

17 REPLIES 17
Quentin
Super User

I also would be interested in thoughts and experiences of comparative evaluations between Databricks and SAS.  Databricks seems to be getting a lot of buzz recently, and that will likely only grow with their recent AI acquisition https://techcrunch.com/2023/06/26/databricks-picks-up-mosaicml-an-openai-competitor-for-1-3b/  .

The Boston Area SAS Users Group is hosting free webinars!
Next up: Bart Jablonski and I present 53 (+3) ways to do a table lookup on Wednesday Sep 18.
Register now at https://www.basug.org/events.
ballardw
Super User

A reply based only on reading that article as I have no experience with the products.

 

Near the top of the article is a statement: "The PySpark DataFrame API has most of those same capabilities." and shows a trivial data step and apparently analogous code for PySpark. I would determine which capabilities SAS has that this brief example does not match. Then see if any of those are critical to your current processes. Again, I have no experience with that product but if you have use a lot SAS Merge, Update or Modify data steps (just as an example) and those happen to be not one of those "same capabilities" or require lots of code to migrate then those points would be worth addressing.

 

The example addressing SQL shows an example done in SAS Proc SQL that would execute much faster as Proc Summary code. I would for other examples where use and comparison of SQL is proposed in lieu of other approaches that use other procedures that may have similar performance impact (assumes you actually use Summary  instead of SQL for similar simple sums).

 

The Venn diagram of features seems to pretty much ignore all of the SAS products outside of Foundation and Stat (if I understand the claim) So perhaps bring up the modules you may use not addressed in that diagram.

 

There is a bit addressing "Column Oriented vs Business Logic" that compares some SAS IF/Then/Else statements with the PsSpark equivalent(? anytime I see code with multiple """ in a single statement I cringe). I suspect whoever prepared that may not be familiar with the SAS Select/When but that may be trivial.

 

There are sections about what SAS does "better" but  I think misses a few points in the bit about Formats: Not common but when helpful the MULTILABEL format supported by summary/means, report and tabulate can move a lot of logic out of other places. Couple that with the PRELOADFMT to report on things that don't appear in a specific analysis sets intermittently can be quit powerful reporting tools. I didn't see anything about data-set driven creation of formats/informats either.

PaigeMiller
Diamond | Level 26

Please also remember that most of these times when a company wants to stop using SAS, it is almost always driven by cost. Whether the article is accurate or not is pretty much irrelevant, in the minds of some managers, the low cost solution wins!

--
Paige Miller
Reeza
Super User
Do you have all programmers or people who use the GUI, EG/Studio Task interface? Databricks handles the AI/ML part well. It also does delta lakes very well and datawarehouseing, which is not something SAS typically does. I don't think Visual Analytics and Databricks Visualizations are on the same level but I haven't used Databricks extensively, we're just implementing it now. The part that Databricks is missing though is that self serve data exploration and manipulation without code.
jimbobob
Quartz | Level 8
Half the team uses gui in SAS studio, the other half are straight coders. I would be curious once you get it up and running more detail on the visualizations, since we have many built in SAS, and if were forced to rebuild in Databricks how hard that would be.
Reeza
Super User
Your Visual Analytics visualizations? We have no plans to rebuild those in Databricks, we'll be using PowerBI for those instead.
SASKiwi
PROC Star

I'm with @PaigeMiller on this one. Any time any company or management talk about replacing SAS with some other product it is all about the software cost and not about the capabilities. That article you shared has a very narrow focus and just really covers a few technical features. I don't regard it as a basis for deciding to choose Databricks over SAS.

 

However if you are already a longtime SAS site, how many SAS applications do you have and what would be the cost of converting these to another product? This is where many managers become short-sighted. The software costs are peanuts compared with the cost of software conversion. In the case of the SAS applications I look after, the dollar cost of conversion would be into 8 digits, many times the software licensing costs. And that's quite apart from the challenge of resourcing it...

Quentin
Super User

I don't know that companies are considering replacing SAS purely for cost reasons.

 

I am not an AI/ML person.  But my understanding from some of the AI/ML people in my organization is that new models/algorithms are often available in python long before they would be available in SAS, and this is a reason they are happy with Databricks (PySpark).  (And yes, I know that SAS and Viya are Python-friendly.) Ten years ago, there may have been some naive  people thinking "lets replace PC SAS with python/R because they're free", but I don't think that's necessarily the case today.  Databricks is not cheap.

 

On a more basic level, I knew a group that developed DI studio ETL jobs in SAS 9.  They used DI studio jobs, not code.  When they attempted to migrate to Viya (I can't remember which version), they discovered that many of the point-and-click DI Studio transformations they used were not available yet in Viya, and that they would be migrated as user-written code rather than transformation objects.  This led them to evaluate other non-code ETL products rather than move on with Viya.

 

I'm a happy SAS user/advocate/etc.  So I'm definitely not arguing for Databricks.  But as an advocate for SAS, I don't think we should assume that when companies consider replacing SAS, they are driven only by mistaken cost assumptions.

I'm curious, @Reeza , since your company is implementing Databricks now.  Do you think the company sees Datababricks as providing value along side Viya, or is it seen as an alternative to Viya?

The Boston Area SAS Users Group is hosting free webinars!
Next up: Bart Jablonski and I present 53 (+3) ways to do a table lookup on Wednesday Sep 18.
Register now at https://www.basug.org/events.
PaigeMiller
Diamond | Level 26

@Quentin wrote:

I don't know that companies are considering replacing SAS purely for cost reasons.


It happened to me on a previous job. I left rather than start over learning some other programming language.

--
Paige Miller
Reeza
Super User
Sadly haven't worked in a SAS shop for about 4 years now. Cost isn't the issue.

It's about 10x harder to find a trained SAS person than it is to find a R/Python programmer these days though and that's a bigger issue for us. Newer algorithms aren't as important in our world, but the feature set in Python/R or the ability to find easier answers on SO or assistance does matter. Databricks also bills as used, so companies like that feature as well. I prefer dataiku personally so trying to push to that platform instead.
SASKiwi
PROC Star

FYI SAS Viya has a "bill by user numbers" charging model too, different to SAS 9.4's by product and number of cores charging.

 

The trend these days seems to be companies only hiring experienced developers, and not training in new software. So if you didn't learn it at university or some other means then tough...not just a SAS issue really.

 

I've seen Databrick's marketing strategy firsthand - touting themselves as the "SAS replacement". Like any marketing strategy, use your own judgement.

 

 

   

mkeintz
PROC Star

@Reeza wrote:
Sadly haven't worked in a SAS shop for about 4 years now. Cost isn't the issue.

It's about 10x harder to find a trained SAS person than it is to find a R/Python programmer these days though and that's a bigger issue for us.

...

In a way, this is a cost issue - the indirect cost of finding the SAS programmers.  

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------
PaigeMiller
Diamond | Level 26

@mkeintz wrote:

@Reeza wrote:
Sadly haven't worked in a SAS shop for about 4 years now. Cost isn't the issue.

It's about 10x harder to find a trained SAS person than it is to find a R/Python programmer these days though and that's a bigger issue for us.

...

In a way, this is a cost issue - the indirect cost of finding the SAS programmers.  


In my case, when the small company I worked for was purchased by a larger company, I was the only SAS programmer they needed and we had one SAS license. So finding a SAS programmer wasn't really an issue. It was simply that the $X thousand dollars we were paying to SAS each year could be replaced by paying zero dollars for R. The fact that I would have had to spend 6-12 months learning R and then writing code to replace the existing SAS code, while at the same time doing nothing else valuable, didn't seem to make a difference to the new company, they were cutting costs! Of course, every company and every person has a different story.

--
Paige Miller
Reeza
Super User
In our company we run Posit/RStudio Server so there are still R costs for us. Things like package management are important when you have bigger teams and production jobs.

SAS Innovate 2025: Call for Content

Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 16. Read more here about why you should contribute and what is in it for you!

Submit your idea!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 17 replies
  • 4513 views
  • 19 likes
  • 7 in conversation