CAS is Fast
- Article History
- RSS Feed
- Mark as New
- Mark as Read
- Bookmark
- Subscribe
- Printer Friendly Page
- Report Inappropriate Content
Some people need time to warm-up to new things. As a member of the Global Enablement and Learning team, I see this every day. While "late adopters" are generally quite willing to learn the latest twist or tweak on something they already know well, they balk at the truly novel and complete paradigm shifts. Such it is with CAS. Many SAS users are taking a "long" approach.
One group that might be reluctant to adopt CAS is surprisingly SAS' most ardent supporters. These users have invested lots of time into their SAS knowledge. With so much invested, they can be reluctant to work in another construct where they aren't so confident or skilled.
So, for you SAS true believers, let's look at how to get the best performance in CAS and compare it to SAS (standard disclaimers about these tests being done on non-optimal, virtual hardware...) and, hopefully, once you know how to get vastly better performance from CAS than single engine (but still awesome!) base SAS, you'll be more comfortable giving it a try.
1. Group BY Aggregation -- Low Cardinality
When choosing data processing techniques in base SAS, you usually only have DATA Step or PROCs. You have these in CAS as well, but, when looking for optimum performance, the best place to start is with the CAS Actions, and aggregation is no exception.
Thanks to Nicolas Robert, we already know which CAS action aggregates the fastest (at least in his scenario), simple.summary. So, let's compare its performance on some big data against base SAS on the same.
Test Parameters
Test Parameter | Value |
Input Rows | 160 million |
Distinct BY Groups (Cardinality) | 8 |
CAS Code -- Simple.Summary
proc cas ;
simple.summary result=r status=s /
inputs={"revenue","expenses"},
subSet={"SUM"},
table={
name="mega_corp"
caslib="visual"
groupBy={"facilityType","productline"}
},
casout={name="summaryMC", replace=True, replication=0} ;
quit ;
Base SAS Code -- PROC MEANS
proc means data=mega_corp noprint;
var revenue expenses;
class facilityType productline;
output out=summaryMC sum(revenue)=sumRevenue sum(expenses)=sumExpenses;
run;
Results
Engine | Method | Real Time |
CAS | Simple.Summary | 7.39 |
SAS | PROC MEANS | 2:32.44 |
2. Group BY Aggregation -- High Cardinality
There has been some talk that CAS does not perform well with high cardinality operations. Let's take a look by increasing the number of BY-Groups. We'll use the same code as above but replace the GroupBy and CLASS variables with productID, date, and unit. This gives us approximately 88,000 distinct groups.
Test Parameters
Test Parameter | Value |
Input Rows | 160 million |
Distinct BY Groups (Cardinality) | 88,000 |
Results
Engine | Method | Real Time |
CAS | Simple.Summary | 18.22 |
SAS | PROC MEANS | 2:31.65 |
3. De-Duplication
As with aggregation, picking the right technique is key and, thankfully again, Nicolas Robert has already shown us which method to use for de-duplication, the simple.GroupBy CAS action.
So, let's compare simple.GroupBy with PROC SORT.
Test Parameters
Test Parameter | Value |
Input Rows | 160 million |
Unique Keys | 88,000 |
CAS Code -- Simple.GroupBy
proc cas;
simple.groupBy result=r status=rc /
inputs={"productID", "date", "unit"}
table={caslib="casuser",name="mega_corp"}
casOut={name="dedupMC",replace=true,replication=0} ;
run ;
quit ;
Base SAS Code -- PROC SORT
proc sort data=mega_corp nodupkey out=dedupMC;
by productID date unit;
run;
Results
Engine | Method | Real Time |
CAS | Simple.GroupBy | 12.57 |
SAS | PROC SORT | 3:09.29 |
Discussion
So, there you have it. CAS is fast. It is plowing through some decent sized data here on a few (5), relatively small (4-way) virtual servers in seconds.
If you want performance like this however, you need to know which techniques to use. Luckily some of the hard work has already been done for you. In particular check out these posts:
- CAS answers to 4 common data manipulation tasks – Part 1 – APPEND
- CAS answers to 4 common data manipulation tasks – Part 2 – SORT
- CAS answers to 4 common data manipulation tasks – Part 3 – DE-DUPLICATE
- CAS answers to 4 common data manipulation tasks – Part 4 – AGGREGATE
You'll also need to know more about CAS Actions. In particular, you'll need to know how to enhance them so they do exactly what you want. This post should help with that:
- Mark as Read
- Mark as New
- Bookmark
- Permalink
- Report Inappropriate Content
You're absolutely right, many are reluctant to adopt cas language.
first I was confused and even thought about giving up.
now I LOVE cas actions, they are so efficient!
- Mark as Read
- Mark as New
- Bookmark
- Permalink
- Report Inappropriate Content
- Mark as Read
- Mark as New
- Bookmark
- Permalink
- Report Inappropriate Content
I wanted to try out CAS, but found it surprisingly hard to find information on how to set up a Personal CAS Server. Any pointers anyone might have are highly appreciated
- Mark as Read
- Mark as New
- Bookmark
- Permalink
- Report Inappropriate Content
Comment to section 3. De-Duplication:
Nicholas Robert's article was written for Viya 3.3. There exists in-between now the specialized action deduplication.deduplicate.
With source and target table in CAS Proc Sort Nodupkey translates to action deduplication.deduplicate. What this means: One can continue to use the good old and simple Proc Sort Nodupkey syntax and it will get mapped to the specialized action if possible.
- Mark as Read
- Mark as New
- Bookmark
- Permalink
- Report Inappropriate Content
SAS Viya is cloud based deployment, and there is no way for "set up a Personal CAS Server" per-say, and as far as I know, your only options for "personal" access might be
- SAS Viya for Learners If you have academic email address
Hope this helps,
Ahmed
- Mark as Read
- Mark as New
- Bookmark
- Permalink
- Report Inappropriate Content
It's still mind-boggling to me that SAS hasn't made it easy for learners without an academic email address to get access to a sandbox Viya environment to play around and try it out. Maybe if you call SAS sales and ask for a 30 day trial demo they will give you access to a Viya environment so you can try using CAS.
As @AhmedAl_Attar mentioned, the other option is to try pay-as-you-go Viya. I did that last year, since MS gives you some free credits for the first month. But then I shut it down, because I didn't want to worry about monitoring the expenses for a personal playground.
- Mark as Read
- Mark as New
- Bookmark
- Permalink
- Report Inappropriate Content
I forgot this additional option
- Experience SAS® Viya® for yourself (You register for free 14-day trial)
- Mark as Read
- Mark as New
- Bookmark
- Permalink
- Report Inappropriate Content
Thanks @AhmedAl_Attar . I forgot about that too! 14 days is a good start. Maybe they'll let you extend or do multiple trials if you want more time.
- Mark as Read
- Mark as New
- Bookmark
- Permalink
- Report Inappropriate Content
Thanks @AhmedAl_Attar and @Quentin
I had read/heard that it is cloud only as well, but then I came across this which raised my interest:
This explicitly mentions a "Personal CAS Server" as a Sandbox for playing around. But like I asked originally, how do you get that up and running?
Btw, I am on a Foundation 9.4M8 client
Cheers,
JB
- Mark as Read
- Mark as New
- Bookmark
- Permalink
- Report Inappropriate Content
@JB1_DK if you purchase Viya, you can choose whether to install it in the cloud or on-prem. I think that part of the docs is saying if you have a paid viya instance (probably regardless of whether in cloud or on-prem), you can configure it to have one or more personal CAS servers for your users.
- Mark as Read
- Mark as New
- Bookmark
- Permalink
- Report Inappropriate Content
From reading the docs link you included, it sounded to me, Personal CAS server is defined/enabled by a setting the Viya installer would enable. It will live within the same Docker/Kubernetes infrastructure and not on-prime/personal machine/server.
This is my personal understanding. but I think you can get the definitive answer from SAS Tech support.
- Mark as Read
- Mark as New
- Bookmark
- Permalink
- Report Inappropriate Content
Yeah, I might reach out to Tech support for this one. Thanks @Quentin , @AhmedAl_Attar