Myth Busters: CAS Tables Hog All the Memory (They don’t)

Back in the LASR days, you could choose to back LASR in-memory tables with Hadoop or not. The Hadoop backing store primarily served as a means to extend LASR's table space beyond the LASR servers' available memory via HDFS SASHDAT file memory-mapping . Without Hadoop, total LASR table size was limited to available server memory (RAM). This was particularly limiting for SMP (single server) LASR since Hadoop integration was not available there.

CAS evolved beyond LASR's memory mapping capabilities in that it could memory map directly from files on the local file systems without the need for Hadoop. And, even when it couldn't memory map a source table directly, it could load the table into a local memory mapped file location, CAS_DISK_CACHE.

With these file memory mapping strategies, CAS was designed from the beginning to store data beyond its servers' available RAM. However, all this time later, I still hear objections that CAS and its in-memory tables are limited by available memory.

Let's Test It

Well, let's take a look on a live CAS system. On our system, we have a 3 CAS worker environment with approximately 150GB of available memory and 186GB of total memory (used or otherwise) according to the Linux free command.

Select any image to see a larger version.
Mobile users: To view the images, select the "Full" version at the bottom of the page.

Now, let's load more than 186GB of CAS tables and see what happens. Remember 186GB is all the memory that the three workers have available, used or otherwise.

To get us above 186GB, we'll load 1300 copies of the orderFactMem table which is approximately 150MB.

What does the server memory look like after the load? Have we run out of memory?

No, we haven't run out of memory. In fact we still have about 100GB immediately available.

What's going on?????? Well, as discussed above, the CAS tables were loaded to the CAS Disk Cache location which is mapped to memory but is decidedly on disk. Want to see? Here is a view of the data size of the /cas/cache (CAS_DISK_CACHE) directory before and after the load.

Before

After

The CAS Disk Cache directories are much more full after the load than before. They are full of memory mapped files.

If you want to see them, you'll need the lsof command which gives us the ability to see memory cache files.

.

See for yourself?

If you'd like to perform this analysis yourself or expand it, I'm including the Linux commands and SAS program below. As usual, these scripts aren't meant to run blindly. You'll need to supervise them and analyze the results.

Linux commands:

kubectl get pods --all-namespaces

kubectl describe pod sas-cas-server-default-controller -n gelenv

kubectl exec -it sas-cas-server-default-controller -c cas -n gelenv -- /bin/bash -c "free -m"
kubectl exec -it sas-cas-server-default-worker-0 -c cas -n gelenv -- /bin/bash -c "free -m"
kubectl exec -it sas-cas-server-default-worker-1 -c cas -n gelenv -- /bin/bash -c "free -m"
kubectl exec -it sas-cas-server-default-worker-2 -c cas -n gelenv -- /bin/bash -c "free -m"

/* Copy lsof to the pods from the local server */
kubectl cp /usr/sbin/lsof gelenv/sas-cas-server-default-controller:/tmp
kubectl cp /usr/sbin/lsof gelenv/sas-cas-server-default-worker-0:/tmp
kubectl cp /usr/sbin/lsof gelenv/sas-cas-server-default-worker-1:/tmp
kubectl cp /usr/sbin/lsof gelenv/sas-cas-server-default-worker-2:/tmp

kubectl exec -it sas-cas-server-default-worker-0 -c cas -n gelenv -- /bin/bash -c "df -m /cas/cache"
kubectl exec -it sas-cas-server-default-worker-1 -c cas -n gelenv -- /bin/bash -c "df -m /cas/cache"
kubectl exec -it sas-cas-server-default-worker-2 -c cas -n gelenv -- /bin/bash -c "df -m /cas/cache"

kubectl exec -it sas-cas-server-default-worker-0 -c cas -n gelenv -- /bin/bash -c "/tmp/lsof -a +L1"
kubectl exec -it sas-cas-server-default-worker-1 -c cas -n gelenv -- /bin/bash -c "/tmp/lsof -a +L1"
kubectl exec -it sas-cas-server-default-worker-2 -c cas -n gelenv -- /bin/bash -c "/tmp/lsof -a +L1"

SAS Code:

/* Create a CSV file for the memory test */
/* Need a non-hdat source to remove any possibility of memory mapping */

proc casutil incaslib="dm" outcaslib="dm";
load casdata="order_fact.sashdat" casout="orderFactMem";
save casdata="orderFactMem" casout="orderFactMem.csv";
droptable casdata="orderFactMem" incaslib="dm";
quit ;

/* Load the table multiple times */

%macro load;
%do i=1 %to 1300;
   proc casutil incaslib="dm" outcaslib="dm";
   load casdata="orderFactMem.csv" casout="orderFactMem&i" copies=0;
   quit ;
%end;
%mend;

%load;

proc cas;
table.tableinfo caslib="dm" name="orderFactMem1" ;
table.tabledetails caslib="dm" name="orderFactMem1" level="sum";
table.tabledetails caslib="dm" name="orderFactMem1" level="node";
run;

proc cas; 
  accessControl.assumeRole /
       adminRole="SUPERUSER";

  builtins.getCacheInfo;
run;
quit;


cas mysession terminate;

Conclusion

So, don't be afraid of running out of memory with CAS and use it for your data processing needs. It's fast.

.

Want More?

If you want specifics on CAS tables and memory, see this post. .

Find more articles from SAS Global Enablement and Learning here.

aivo · ‎03-08-2023

What will happen when you fill up both RAM and CAS DISK Cache?

SASKiwi · ‎03-08-2023

I assume you would get out of disk space and / or memory errors in your Viya logs. Just like SAS 9, I assume you size your CAS disk cache aka SAS WORK space to never completely fill and you have housekeeping processes to drop data no longer used.