BookmarkSubscribeRSS Feed

Myth Busters: CAS Tables Hog All the Memory (They don’t)

Started ‎07-29-2022 by
Modified ‎03-01-2023 by
Views 539

Back in the LASR days, you could choose to back LASR in-memory tables with Hadoop or not. The Hadoop backing store primarily served as a means to extend LASR's table space beyond the LASR servers' available memory via HDFS SASHDAT file memory-mapping . Without Hadoop, total LASR table size was limited to available server memory (RAM). This was particularly limiting for SMP (single server) LASR since Hadoop integration was not available there.

 

CAS evolved beyond LASR's memory mapping capabilities in that it could memory map directly from files on the local file systems without the need for Hadoop. And, even when it couldn't memory map a source table directly, it could load the table into a local memory mapped file location, CAS_DISK_CACHE.

 

With these file memory mapping strategies, CAS was designed from the beginning to store data beyond its servers' available RAM. However, all this time later, I still hear objections that CAS and its in-memory tables are limited by available memory.

 

Let's Test It

 

Well, let's take a look on a live CAS system. On our system, we have a 3 CAS worker environment with approximately 150GB of available memory and 186GB of total memory (used or otherwise) according to the Linux free command.

 

sf_1_available-RAM-2.png

Select any image to see a larger version.
Mobile users: To view the images, select the "Full" version at the bottom of the page.

 

Now, let's load more than 186GB of CAS tables and see what happens. Remember 186GB is all the memory that the three workers have available, used or otherwise.

 

To get us above 186GB, we'll load 1300 copies of the orderFactMem table which is approximately 150MB.

 

sf_2_load1300OrderFact.png

 

What does the server memory look like after the load? Have we run out of memory?

 

sf_3_available-RAM-after-1.png

 

No, we haven't run out of memory. In fact we still have about 100GB immediately available.

 

What's going on?????? Well, as discussed above, the CAS tables were loaded to the CAS Disk Cache location which is mapped to memory but is decidedly on disk. Want to see? Here is a view of the data size of the /cas/cache (CAS_DISK_CACHE) directory before and after the load.

 

Before

 

sf_4_beforeLoad.png

 

After

 

sf_5_afterLoad-1.png

 

The CAS Disk Cache directories are much more full after the load than before. They are full of memory mapped files.

 

If you want to see them, you'll need the lsof command which gives us the ability to see memory cache files.

.

sf_6_lsof.png

 

See for yourself?

 

If you'd like to perform this analysis yourself or expand it, I'm including the Linux commands and SAS program below. As usual, these scripts aren't meant to run blindly. You'll need to supervise them and analyze the results.

 

Linux commands:

 

kubectl get pods --all-namespaces

kubectl describe pod sas-cas-server-default-controller -n gelenv

kubectl exec -it sas-cas-server-default-controller -c cas -n gelenv -- /bin/bash -c "free -m"
kubectl exec -it sas-cas-server-default-worker-0 -c cas -n gelenv -- /bin/bash -c "free -m"
kubectl exec -it sas-cas-server-default-worker-1 -c cas -n gelenv -- /bin/bash -c "free -m"
kubectl exec -it sas-cas-server-default-worker-2 -c cas -n gelenv -- /bin/bash -c "free -m"

/* Copy lsof to the pods from the local server */
kubectl cp /usr/sbin/lsof gelenv/sas-cas-server-default-controller:/tmp
kubectl cp /usr/sbin/lsof gelenv/sas-cas-server-default-worker-0:/tmp
kubectl cp /usr/sbin/lsof gelenv/sas-cas-server-default-worker-1:/tmp
kubectl cp /usr/sbin/lsof gelenv/sas-cas-server-default-worker-2:/tmp

kubectl exec -it sas-cas-server-default-worker-0 -c cas -n gelenv -- /bin/bash -c "df -m /cas/cache"
kubectl exec -it sas-cas-server-default-worker-1 -c cas -n gelenv -- /bin/bash -c "df -m /cas/cache"
kubectl exec -it sas-cas-server-default-worker-2 -c cas -n gelenv -- /bin/bash -c "df -m /cas/cache"

kubectl exec -it sas-cas-server-default-worker-0 -c cas -n gelenv -- /bin/bash -c "/tmp/lsof -a +L1"
kubectl exec -it sas-cas-server-default-worker-1 -c cas -n gelenv -- /bin/bash -c "/tmp/lsof -a +L1"
kubectl exec -it sas-cas-server-default-worker-2 -c cas -n gelenv -- /bin/bash -c "/tmp/lsof -a +L1"

 

SAS Code:

 

/* Create a CSV file for the memory test */
/* Need a non-hdat source to remove any possibility of memory mapping */

proc casutil incaslib="dm" outcaslib="dm";
load casdata="order_fact.sashdat" casout="orderFactMem";
save casdata="orderFactMem" casout="orderFactMem.csv";
droptable casdata="orderFactMem" incaslib="dm";
quit ;

/* Load the table multiple times */

%macro load;
%do i=1 %to 1300;
   proc casutil incaslib="dm" outcaslib="dm";
   load casdata="orderFactMem.csv" casout="orderFactMem&i" copies=0;
   quit ;
%end;
%mend;

%load;

proc cas;
table.tableinfo caslib="dm" name="orderFactMem1" ;
table.tabledetails caslib="dm" name="orderFactMem1" level="sum";
table.tabledetails caslib="dm" name="orderFactMem1" level="node";
run;

proc cas; 
  accessControl.assumeRole /
       adminRole="SUPERUSER";

  builtins.getCacheInfo;
run;
quit;


cas mysession terminate;

 

Conclusion

 

So, don't be afraid of running out of memory with CAS and use it for your data processing needs. It's fast.

.

Want More?

 

If you want specifics on CAS tables and memory, see this post. .

 

Find more articles from SAS Global Enablement and Learning here.

Comments

What will happen when you fill up both RAM and CAS DISK Cache?

I assume you would get out of disk space and / or memory errors in your Viya logs. Just like SAS 9, I assume you size your CAS disk cache aka SAS WORK space to never completely fill and you have housekeeping processes to drop data no longer used.

Version history
Last update:
‎03-01-2023 04:37 PM
Updated by:

sas-innovate-2024.png

Available on demand!

Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.

 

Register now!

Free course: Data Literacy Essentials

Data Literacy is for all, even absolute beginners. Jump on board with this free e-learning  and boost your career prospects.

Get Started

Article Labels
Article Tags