BookmarkSubscribeRSS Feed

What’s CASDATADIR used for?

Started ‎03-30-2020 by
Modified ‎04-08-2020 by
Views 6,931

 

For those of you who have read the SAS Viya 3.5 for Linux: Deployment Guide and deployed Viya, you will have come across CASDATADIR references. Interestingly it does not get mentioned much elsewhere, which may seem odd given it's purpose. If you are a technical architect or are working with customers to ensure performance factors beyond the CAS Disk Cache, networking and optimum data loading techniques then read on.

 

Let's start with some content from SAS Viya 3.5 for Linux: Deployment Guide:

 

By default, product caslibs are written to /opt/sas/viya/config/data/cas/default, which is often hosted on a single hard disk drive with limited storage. To ensure proper performance of your SAS solutions, SAS recommends that the CASDATADIR option be configured to point to a high-performance storage platform. Examples of high-performance storage platforms include SAN, NVMe, and multiple drive disk arrays. ... Changing the CAS data directory is especially useful for solutions that can be resource-intensive, such as SAS Visual Forecasting, SAS Visual Data Mining and Machine Learning, SAS Visual Text Analytics, and SAS Analytics for IoT. Multiple predefined system caslibs and the Public caslib have a default location for persistent storage: /opt/sas/viya/config/data/cas/instance-name/name-of-Public-caslib. You can specify the instance name when you edit the vars.yml. If you anticipate that many users will use browsers to access the user interfaces and to import data from files, additional space for this file system will be required. SAS recommends monitoring disk usage at /opt/sas/viya/config/data/cas (assuming this the value of CASDATADIR).

 

Acknowledgements. Before I go any further I wanted to call out this post reflects conversations and information my colleagues here at SAS have shared. My appreciation to them for their input.

 

Useful data points

  • CASDATADIR - also termed CAS Data Directory
  • By default the directory for CASDATADIR is /opt/sas/viya/config/data/cas/default
  • The CASDATADIR is defined in the CAS_CONFIGURATION block of the vars.yml file
  • When installing and configuring an MPP CAS Server, the CASDATADIR is configured on the CAS Controllers (Primary and Secondary) as well as the CAS Workers
  • In practice/the default action is for applications like SAS Visual Forecasting, SAS Visual Data Mining and Machine Learning, SAS Visual Text Analytics to write results tables to the sub-directories within CASDATADIR only on the CAS Controller i.e. files small or big are written as a block (they are not distributed blocks of data across CAS Workers)

 

Architecture considerations based on the data points

  • In some customer environments where they have heavy analytical usage (users + volume of data), the CAS Controller could get really busy
  • Ensuring the sufficient bandwidth/throughput between the CAS Workers & CAS Controller becomes really important
  • Writing out data to CASDATADIR (and reading it from) as quickly possible, should be a goal to minimise the impact on CAS Controllers's resources i.e. i/o for the CAS Controller needs to meet the usage patterns
  • Knowing what and when data is getting written to CASDATADIR can help inform customer teams on the performance requirements within architecture design
  • Knowing of methods to prevent unnecessary writing to CASDATADIR location might prove beneficial e.g. have large output tables produced through modelling, to be written to HDFS or DNFS
  • For some customers, backing up the CASDATADIR structure is likely to be a useful recommendation

 

What's in CASDATADIR?

A quick directory listing of the Viya 3.5 environment we use for our teams testing here at SAS shows this, with the root directory being /opt/sas/viya/config/data/cas/default. FYI. 'default' refers to the name of the CAS Server. If an environment has multiple CAS Servers, then something other than 'default' is likely to listed as that directory. Some additional information regarding the predefined Caslibs can be found in the SAS Viya 3.5 Administration: Data document. Sub directories for individual users can be found under 'casuserlibraries' and sub-directories for individuals projects can be found under the 'projects' directory.

 

 drwxr-xr-x.     2      cas sas       4096   Dec  4    17:48    appData 
 drwxr-xr-x.     55     cas sas       4096   Dec  4    17:48    casuserlibraries 
 drwxrwxrwx.     2      cas sas         64   Feb 10    02:30    formats 
 drwxrwxr-x.     2      cas sas          6   Jun 26    2018     modelMonitorLibrary 
 drwxrwxrwx.     2      cas sas       4096   Dec  2    16:34    models 
 drwxr-xr-x.     2      cas sas          6   Jun 26    2018     modelStore 
 drwxr-x---.     16     cas sas       4096    Nov 26   14:13    projects 
 drwxrwxrwx.     2      cas sas       4096    Dec  4   17:48    public 
 drwxr-xr-x.     2      cas sas          6    May 30   2019     qasMartStore 
 drwxr-xr-x.     2      cas sas       4096    Dec  4   17:48    referenceData 
 drwxr-xr-x.     2      cas sas       4096    Jun 26   2018     samples 
 drwxr-xr-x.     2      cas sas        103    Feb  7   15:54    search 
 drwxr-xr-x.     2      cas sas          6    Jun 26   2018     sysData 
 drwxrwxrwx.     2      cas sas          6    Jun 26   2018     vamodels 

 

CASUSER libraries - to write or not to write to CASDATADIR

Users of visual applications can choose to write to the CASUSER directory from the visual applications. The decision on whether the CASUSER libraries are written to the CASDATADIR directory structure is based on:

  • the type and interface being used
  • the membership of users to the CASHostRequired custom group.

The table below from the SAS® Viya® 3.5 Administration: Identity Management document explains this clearly. See table 1 below.

 

User Scenario

CASUSER Path Location

Session Information

User starts CAS sessions from visual interfaces (includes all SAS Viya interfaces except SAS Studio 4 and Base SAS or SPRE sessions), and user is not a member of the CASHostAccountRequired custom group. This is the default behavior.

/opt/sas/viya/config/data/cas/default/

casuserlibraries/username

Sessions run under the CAS server user (cas). The directory and all files within it are owned by the cas user.

User starts CAS sessions from visual interfaces (includes all SAS Viya interfaces except SAS Studio 4 and Base SAS or SPRE sessions), and user is a member of the CASHostAccountRequired custom group.

$HOME/casuser

Sessions run under the user’s host account.

User starts CAS sessions from SAS Studio 4, Base SAS, or SPRE, regardless of whether the user is a member of the CASHostAccountRequired custom group.

$HOME/casuser

SAS Studio 4, Base SAS, and SPRE sessions always run under the user’s host account, and use the $HOME/casuser CASUSER path location.

Sessions run under the CAS server user (cas). The directory and all files within it are owned by the cas user.

 

One item that may be worth mentioning here is when the user is a member of the CASHostAccountRequiredGroup and uses the visual interfaces to store data in the CASUSER caslib. The $HOME directory may result in out-of-space issues as some customer IT teams may restrict the size of the $HOME directory per user.

 

Large input tables & CASDATADIR

For users of VDMML (Model Studio interface) they may be familiar with the fact that on the first run of the Data Node, the source data is copied to the CASDATADIR directory structure e.g. /opt/sas/viya/config/data/cas/default/projects/datamining-0abc3abf-9ad2-477a-89b5-989f1e4cfe9a. The good news is that very recently a method to prevent that happening was documented in the latest version of the Model Studio 8.5: SAS® Visual Data Mining and Machine Learning 8.5: Advanced Topics document. Here is the text:

 

Model Studio copies the data source when the first Data node is run. This can cause performance issues and can cause you to run out of disk space. The amount of space that is required depends on the number of saved projects and on the size of the data source.

To prevent Model Studio from automatically creating copies of your data, ensure that the following conditions are met:

  1. A Key variable exists in your data. This can be either a variable named _INDEX_ or a variable that is assigned the role Key.
  2. A Partition variable exists in your data. This can be either a variable named _PARTIND_ or a variable that is assigned the role Partition.
  3. The data must be persistent on the disk.

For 3., there here is some additional clarification. The table must be loaded from the caslib source directly. You can not use proc casutil to load a table into a caslib which has different source path than the table.

 

 

Large output tables & CASDATADIR

Depending on the type of analytics being done, it may require that the original data be written in full to an output table. The output table will contain additional columns being appended e.g. the columns will contain predicted or forecasted values and the delta between predicted/forecasted values etc. Consider output from SAS Visual Forecasting, where there have been massive input tables due to the nature of what is being forecasted e.g. groceries, values of stocks and shares, etc. Since the forecasted values will need to be used further down the business process, the likelihood is the table will need to converted into another format e.g. CSV, for other applications to leverage. Therefore when working with large output tables, the customer team need to consider when and where to place the output tables, and when to convert them into another format. If a customer team has a mixture of VA users, data scientists and forecasting specialists all using one CAS Server, it may be preferable to initially write the output table into HDFS or DNFS as a SASHDAT file. Then later in the day (outside of normal office hours) write the table out as CSV file and make it available to the business users outside of the Viya environment. Writing very large output tables to the CASDATADIR location (think 10 GB and upwards) during normal office hours may impact the user experience for multiple user groups (assuming they all share one CAS Server.

 

Summary

Knowing the purpose of the CASDATADIR directory and it's sub-directories will help technical architects, users and administrators alike. It will hopefully limit unwanted end-user experiences and contribute to a performant SAS Viya environment. As always comments are welcome, especially if you think there is something which you consider could add value or needs clarifying/correcting.

 

Thanks, Simon

Comments

Hi, Simon!

Before install SAS Viya 3.5 I changed CASDATADIR in vars.yml to /opt/sas/data/cas (separate disk).

I see all dirs (/opt/sas/data/cas/appData, ...) except of casuserlibraries

In SAS Studio 5.2 I save some table to CASUSER library.

Why table saves to

/opt/sas/viya/config/data/cas/default/casuserlibraries/<myUserName>/tmp.sashdat

instead of

/opt/sas/data/cas/casuserlibraries/<myUserName>/tmp.sashdat ?

 

Code to run:

cas;
caslib _all_ assign;

data casuser.tmp;
	a=10;
run;

proc casutil;
	save 
		incaslib='casuser' casdata='tmp'
		outcaslib='casuser' casout="tmp";
quit;

proc casutil;
	list files incaslib='casuser';
quit;

 

Thanks

Hi Alexandr,

 

I'm currently out of the office but i hope to give you a more definitive answer by the 26th June.

 

The behaviour you are seeing is not what i expected so i will check some things when back in the office.

 

In the meantime if you want to change the location of CASUSERLIBRARIES then take a look at:

 

env.CASMAKEHOMEDIR

env.CASHOMEDIRLOC

https://go.documentation.sas.com/?docsetId=calserverscas&docsetTarget=n08000viyaservers000000admin.h...

 

You will need to set env.CASMAKEHOMEDIR (set this to CONTROLLER) for env.CASHOMEDIRLOC to work.

 

Cheers, Simon

 

I changed casconfig_usermods.lua on the controller:

env.CASMAKEHOMEDIR='CONTROLLER'
env.CASHOMEDIRLOC='/opt/sas/data/cas/casuser'

and restarted controller.

After running the program, I see that the folder /opt/sas/data/cas/casuser/cas/casuser has been created, but tmp.sashdat saved to /opt/sas/viya/config/data/cas/default/casuserlibraries/<myUserName>/tmp.sashdat

Thanks for the update Alexandr.

 

Based on the information you have provided i would have expected that to have worked @ Viya 3.5. That said perhaps there is one or more configuration options that we need to adjust.

 

At this juncture, please can you contact Technical Support and open a ticket regarding this issue and share the link to this thread on SAS Communities. I am not in the office today (22nd June) but when i return tomorrow I will discuss the issue with the Technical Support team.

 

--Simon

 

@Alexandr two more questions for you whilst i have time.

 

Do you have CASHostAccountRequired custom group configured and if so, is the user you are running the test under a member of that group?

 

Also are you using SAS Studio (Enterprise) or SAS Studio (Basic): https://go.documentation.sas.com/?docsetId=calconfig&docsetTarget=n05011sasconfiguration0admin.htm&d...

 

 

@SimonWilliams Thank you for participating in this issue.

Track 7613112665 created.

I have full deployment of SAS Viya 3.5 (RHEL 7) and authorising using LDAP.

I don't use CASHostAccountRequired custom group.

For test I create this group and add my account.

Now my program fail at this step:

85 data casuser.tmp;
86 a=10;
87 run;
NOTE: Running DATA step in Cloud Analytic Services.
NOTE: The DATA step has no input data set and will run in a single thread.
ERROR: Cloud Analytic Services failed writing to system disk space. Please contact your administrator.

But I see path of CASUSER library: /opt/sas/data/cas/casuser/<MyAccount>/casuser

Permission for  /opt/sas/data/cas/casuser cas:sas 777 rwxrwxrwx

My thanks to @Alexandr who took the time to ask the questions (and his patience) and worked with some of us at SAS to discuss further.

 

When CASDATADIR is switched to a new directory location on the OS, additional options have to be set to specifically for the CASUSER caslib to ensure that is placed in the correct location.

 

See: 

 

The cas.USERLOC option in casconfig_usermods.lua must be set and within the opt/sas/viya/config/etc/sysconfig/cas/default/sas-cas-usermods. Add the line: export SASUSERLOCDIR=path/%USER. Please take a look at this part of the documentation: https://go.documentation.sas.com/?cdcId=calcdc&cdcVersion=3.5&docsetId=calserverscas&docsetTarget=n0... for more details.

 

 

cas.USERLOC='%HOME' | 'pathname/%USER'

specifies that the CAS server creates a personal caslib for each user at session start-up time in the specified location.

'%HOME' equates to the user’s operating system $HOME directory.

'pathname/%USER' refers to a directory named for the user’s user ID under the specified file system path. Make sure that %USER is always placed at the end of the path that is specified for the option value.

Enclose pathname in single quotation marks.

 

 

IMPORTANT You must update SASUSERLOCDIR in /opt/sas/viya/config/etc/sysconfig/cas/default/sas-cas-usermods. Add the line: export SASUSERLOCDIR=path/%USER.

Valid in

casconfig_usermods.lua file

Category

Caslib

Restriction

Applies to Linux only.

Examples

In this example, the personal caslib directory is the user’s operating system $HOME directory:

cas.userloc='%HOME'

In this example, the user’s personal caslib directory is named for his or her user ID and is located under /local:

cas.userloc='/local/%USER'

 

Hi Simon,  When some system tables go missing for whatever reason (CAS, CAS_NODE, CAS_SYSTEM ) for example and the small charts below the Dashboard do not populate because of the missing tables, is there a manual way (running a code or set of codes) to bring missing system tables in CAS up and running?  The only way right now would be to do a full reboot of the SAS Viya system.  Thanks for any help.

Hi @RGarrido ,

 

Sorry but i'm only now seeing your question.

 

Is your question in reference to Environment Manager dasboards in Viya 3.5?

If so i will do some quick research to see if there is any relevant content. If it is Viya 3.5, what software ship event is Viya environment based off? 

--Simon

 

Thanks for your response Simon.  Yes I am referring to the Enviroment Dashboard in SAS Viya 3.5.  The small charts at the bottom (reports) are dependent on various system data tables that are automatically updated and populated and loaded by SAS Viya.  Sometimes, these system tables go missing and when they do, the small charts do not work anymore.  Some tables in system data folder include CAS, CAS_NODE and CAS_SYSTEM and also the AUDIT tables.  Here is the same problem: https://communities.sas.com/t5/SAS-Viya/Can-t-see-CAS-Activity-report-in-SAS-Viya/td-p/538780

 

Thank you for looking into this.  Might just discover a quick solution rather than having to restart the whole server.  Cheers.

Hi @RGarrido ,

Ok so if this works sometimes but not all the time, then perhaps something is affecting the network traffic or networking addressing. Before proceeding, remember that SAS has an excellent Technical Support team, and that issues like this are probably best dealt with by speaking with the TS team.

 

It could be that for the host machine that the CAS Controller has a different hostname depending on how the hostname is being resolved. For example using nslookup (which uses the DNS) for a given IP address may provide a different hostname compared to the one listed within the Viya inventory.ini file. If there is a discrepancy then you could update the /etc/hosts file on the CAS Controller to ensure the hostname(s) & IP address that the CAS Controller can be known as, match the hostname for the CAS Controller in the inventory.ini file.

You should also check that within the inventory.ini file you see the CAS Controller host is listed under the CommandLine group (I'm guessing it is because you note that this is an intermittent problem) e.g.

 

# The CommandLine host group contains command line interfaces for remote interaction with services.
# It should include every host in the deployment.
[CommandLine]
viyaspre1
viyaconfig1
viyamicro1
viyacas1

 

You may also want to generally chedk out hotfixes for Viya 3.5 that are focused on CAS e.g. https://support.sas.com/kb/68/026.html

 

Hope that helps.

 

--Simon

 

Version history
Last update:
‎04-08-2020 02:59 PM
Updated by:
Contributors

sas-innovate-2024.png

Available on demand!

Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.

 

Register now!

Free course: Data Literacy Essentials

Data Literacy is for all, even absolute beginners. Jump on board with this free e-learning  and boost your career prospects.

Get Started

Article Tags