Watch this Ask the Expert session to learn about SAS Viya architecture and programming methods for data access and management.
This webinar is perfect for experienced SAS programmers interested in learning more about Viya and Viya programming including:
The questions from the Q&A segment held at the end of the webinar are listed below and the slides from the webinar are attached along with the demo code I showed during the presentation.
Can you read Oracle tables or other database tables in SAS Viya like we can in SAS 9?
Yes. You can use SAS/ACCESS LIBNAME statements and SQL passthrough that execute on the SAS Compute Server in SAS Viya to connect to databases and they work identically as they do in SAS9. Additionally, you can use CASLIBs to establish a CAS server connection to databases. CASLIBs are the best way to move data from database tables into CAS memory as CAS tables and to save CAS tables on disk as database tables. Here is a link on documentation for using CASLIBs with databases. CAS Data Connector Documentation
Why is there both PROC SQL and PROC FEDSQL? What is the difference?
PROC FEDSQL was created more recently than PROC SQL. The FEDSQL syntax has fewer SAS specific elements, making it more likely that your SQL code can be translated into native SQL code and pushed down into the database for execution when using database tables rather than pulling data from database tables back to SAS for processing. PROC SQL can also do this in many cases, but it depends more on how you write the SQL queries. FEDSQL also supports more of the ANSII datatypes including high precision numeric and varchar data types. PROC SQL will convert such datatypes to the SAS numeric or character data types for processing. Further comparison of PROC FEDSQL and PROC SQL can be found in a recent Ask the Expert talk available on demand.
Can I use SAS datasets from SAS 9 in SAS Viya?
Yes.
Can I use Enterprise Guide to develop in Viya?
Yes. If you have a SAS9 maintenance release 5 (M5) or later, you can use Enterprise Guide rather than SAS Studio. When you do that then the SAS9 session that Enterprise Guide uses replaces the SAS Compute Server role in SAS Viya.
Running in CAS requires memory. What happens when you have hundreds of users and you run out of memory? Or am I misunderstanding?
You are correct. You need to think about scalability for CAS and figure out how much infrastructure you will need to support the data volume and number of users. The cloud offers ease of scaling up and down. Also keep in mind you do not need to retain CAS tables in memory when they are not being used. They can be loaded from the disk data sources as needed and unloaded from memory when not in use. New in-memory tables you create can be saved to disk and then unloaded from memory as well.
Can you run batch SAS jobs on the compute server?
Yes. Here is a link to the documentation for the Batch Plugin for SAS Viya cli and for SAS Job Execution Web Application
Can you fire up the old "display manager" from the compute server?
No, but you CAN start the display manager (also known as the SAS Windowing environment) and use its SAS9 session in place of the SAS Compute Server to interact with CAS. This is so if you have SAS9 maintenance release 5 (M5) or later.
If CASLIB is an in-memory process, what happens if the server node goes down? Will all my work be gone?
Yes, if you don’t save your in-memory tables to disk. Every CASLIB has a disk data source location when you can permanently store and an in-memory location where the procedures are accessing the data.
To run an end user session, do we always need to do via the compute server, or can we also create a CAS session directly?
SAS Studio is the application or interface we provide in SAS Viya for SAS programmers. Users of other SAS products including SAS Visual Analytics, SAS Model Studio, SAS Data Prep use their own client interface to interact with CAS directly. We also provide technology that allow Python, R, Java, and Lua programmers to interact with CAS directly from their native programming interfaces.
For the load data = option, is the data regular SAS data set or a new format SAS data?
It is a regular SAS dataset. It can be any data read via a SAS library, but it if is a base SAS library then it is a regular SAS dataset. The file format for SAS datasets is the same in SAS Viya as it is in SAS9.
Is there an option to only load 1 copy of a dataset into CAS and not load multiple copies into memory?
When you load data into CAS it is distributed across all the CAS worker nodes, but it does not create a copy of the entire dataset in each node. It splits the data up into chunks and puts different chunks on each node. It still processes that distributed table as a single table. All the CAS nodes work together as a single system to process the data in parallel on each node. As you load the data it is possible to limit the number of nodes that it writes the data to, but by default it will distribute the data across all the CAS worker nodes.
Are there any options or session operations that we would want to set or turn off when connecting to a CAS Viya session?
When you start a CAS session with a CAS statement there are many options that you can set for the session. These include the session encoding (latin1 versus Unicode for instance), language localization options, options for how long to allow the session to run before it times out and more. These options are documented here: CAS session options
Do we need to specifically write in our code to distribute the data into CAS servers? OR, is it done automatically and how do we know up to how much extent the data has been distributed?
It’s done automatically. Whenever you load data into CAS, it will distribute it across all the worker nodes. You have options to limit that if you don’t want to use all the worker nodes.
Is load file like PROC Import?
Yes, they are similar. Load file= supports the same files types as PROC IMPORT. And you can, in fact, use PROC IMPORT to load data into CAS. When you name the output file you are creating with PROC IMPORT, you can point to a CASLIB.
Can we get copies of the sample code?
Yes. It’s attached to this post.
What will be the best way to run SQL from a DB to load into CAS? Typically, our SQLs are several lines long.
The best way to do this is to define a CASLIB to connect to the database. We supply what call SAS Data Connectors that allow you to do this. It requires a SAS/ACCESS license for the specific database or databases you need to connect to. Documentation on how to define a CASLIB using these Data Connector is here. Once you have defined the CASLIB, you can use PROC FEDSQL to write a query that reads from your database CASLIB to create an in-memory table. PROC FEDSQL, like PROC SQL supports both implicit and explicit SQL pass-through that will execute in-database and load your database data source into CAS memory. Or, if you simply want to load the entire database table into CAS, you can use PROC CASUTIL and a load CASDATA= statement.
Can the CAS library be a substitute for Work and all Work datasets exist in-memory? Perhaps with the USER= option?
The SAS WORK library datasets are not in-memory tables. They are stored on disk like all datasets in SAS libraries. They are loaded into memory from disk in each DATA or PROC step in which you use them. I would not recommend storing your SAS Compute server WORK tables in CAS memory as a general practice. The reason you want to put tables in CAS is for processing large volumes of data that otherwise are too large to process on the compute server, because you want to join that data with an existing CAS table, or because users of other SAS Viya applications like SAS Visual Analytics, which use the CAS server, need to access that data for their work.
What's the RAM requirement for each "worker" machine? Say I have 5 datasets in 100GB. Another user may be working on another project with a few large datasets.
There are many considerations that go into sizing such requirements. For the most recent release of SAS Viya, here is some documentation you can start with as you go down the path of determining your system requirements: System Requirements. From that documentation note the following statement:
These guidelines do not attempt to account for all ordering scenarios, but instead are intended to illustrate typical software orders. SAS strongly recommends that you consult with a sizing expert to obtain an official hardware recommendation that is based on your requirements. To request sizing expertise, contact your SAS account representative. If you need assistance in determining your SAS account representative, send an email to contactcenter@sas.com.
What is the purpose of Promoting? Given it is already loaded in CAS memory?
Promoting makes it available for you to use in another session. It also makes it available to other users on the shared CAS server. You can always re-load the data in each session, but it may be more convenient and quicker to not have to. Also, data administrators within the organization may be managing the data that is made available in memory for other users who may not have the knowledge or permissions to load CAS tables. These may include users of other SAS Viya applications like SAS Visual Analytics.
While promoting a table, what if the same table name exists across the sessions, will it replace the existing table while promoting or will it throw an error?
You will get an error if you try to promote an existing global table. You must first drop the existing table from memory explicitly. There is a DROP statement in PROC CASUTIL to do this.
How should one work if data is located in a relational database and this data changes regularly? What are best practices for updating CAS Library for data in a relational database?
I would recommend defining a CASLIB to connect to the database. Defining such CASLIBs is found here. Then use PROC CASUTIL and a LOAD CASDATA= to load that table. Load it as an unpromoted session table so that you will need to load it in each session that you use it. Every time it is loaded, you will be loading the current copy of that table from the database. If you are maintaining a long-term CAS session, perhaps for many users, you may need to schedule a daily running script to drop the in-memory table and reload it from the database into the session.
How do you ingest live data into CAS?
Is this a streaming data source? You may be able to use SAS Event Stream Processing software. SAS Event Stream Processing can read from a large variety of common streaming sources and emit output events to an in-memory CAS table. SAS Event Stream Processing allows you to process the event stream data in a variety before emitting output events, which can be important if the raw event streams are too large to store. Instead you can filter, summarize, capture certain patterns in the event or perform other operations to pre-process the event streams before sending them to a CAS table.
What is the difference between Server Controller and Session Controller?
The distinction between Server and Session in the diagrams used in this presentation is that users launch individual sessions. The servers (both Server Controller and Server Worker) represent the computing hardware, installed software, and general services that support those sessions.
Is there a way of keeping your own version of a table in memory for you to come back to another day if you have closed your CAS session or does it have to be promoted to a global area that everyone to see for it to persist in memory?
The way to do this is to use the CASUSER library. Each individual user has their own global library called CASUSER that only they have access to. So, any table you put in CASUSER is only visible to you even when you promote it.
Can one user open more than one session at the same time?
Yes. When you use the CAS statement you give your session a label. You could create another CAS session with a different label.
For the use case of LOAD CASDATA; do some easy feature engineering (e.g. Var1/Var2*100) -; SAVE CASDATA: is PROC FEDSQL the best way to do that? Or are there any commands/options directly within PROC CASUTIL for creating new variables/columns and executing that entire scenario all in one step (i.e. within a single PROC CASUTIL)?
PROC CASUTIL is a utility procedure focused on managing CAS tables and the disk datasets and files for those tables. Loading, unloading, saving and the like at the table level. You would not use it to compute columns or to query data. PROC FEDSQL is a good way to do that, but you can also execute the DATA step and PROC DS2 in CAS, once tables are loaded into memory. There are many other CAS procedures for data management and analysis as well. Here is a link to an overview that describes the functionality of PROC CASUTIL in somewhat more detail: CASUTIL overview.
Recommended Resources:
Programming for SAS Viya training course
Want more tips? Be sure to subscribe to the Ask the Expert board to receive follow up Q&A, slides and recordings from other SAS Ask the Expert webinars.
It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.
Ready to level-up your skills? Choose your own adventure.
Your Home for Learning SAS
SAS Academic Software
SAS Learning Report Newsletter
SAS Tech Report Newsletter