About CarleighJoC

CarleighJoC

Great blog, just what I needed!

CarleighJoC · ‎12-19-2024

Welcome back to my "Simplifying SAS Viya" series. In Part 1, we discussed differences between the Compute and CAS Server, exploring when to use each and what happens behind the scenes. In this post, we’ll focus on what caslibs are, how to use them, and how they compare to traditional SAS libraries. Throughout this post, examples are included using a SAS Viya LTS 2024.09 environment. For demonstration purposes, I have created a folder OrdersData containing data about customers and their orders from a fictitious company. This folder includes several file types. Select any image to see a larger version. Mobile users: To view the images, select the "Full" version at the bottom of the page. Traditional SAS Libraries In SAS 9, libraries are created with the LIBNAME statement and connect to data sources such as databases, cloud data, or folder paths. A library reference or “libref”, acts as an alias to the physical data source. In code, data is referenced as libref.tablename. Libraries function similarly in SAS Viya when working with data on the Compute Server. Here is an example of creating a libref named ordLib for SAS tables in the OrdersData folder. Using the FREQ procedure, the orders table in ordLib was used to display the number of orders placed in each country. LIBNAME ordLib base "/home/student/Courses/PGVY/OrdersData"; PROC FREQ data=ordLib.orders; TABLES Country /nocum nopercent; RUN; The library ordLib is visible in the Libraries pane, containing three SAS tables. Partial results: This code executes on the Compute Server. As discussed in my previous post, running code on the Compute Server works well unless you're working with a large dataset (over 50GB), programs that requiring multiple reads of the same data, or computationally intensive tasks. In those cases, using the CAS Server is beneficial due to its high speed, in-memory processing capabilities. Working with Data in CAS What changes when working with data on the CAS Server? One of the benefits of working in CAS is that data is held in-memory, reducing I/O. To process data in-memory, we must first create a CAS session. The session encapsulates or “keeps a record” of our work and connects us to the CAS Server. Use a CAS statement to create a session. For example: CAS CJsSession; Now we’re connected to the CAS Server, the session is named CJsSession and the CAS Server is ready to process in-memory data. After establishing a session, connect data to the CAS Server using caslibs. Caslibs connect to a variety of data sources such as data in the cloud, databases, folder paths, and streaming data. Creating a Caslib In SAS Viya, predefined caslibs are available, and users can create their own caslibs with permission from the SAS administrator. Use the CASLIB statement to create a caslib: CASLIB caslibName PATH="/filepath/" LIBREF=libref; caslibName: Up to 256 characters, cannot start with a number, and contains only numbers, letters and underscores. PATH= option: Specifies the location of the physical data files. Unlike traditional SAS libraries, caslibs can contain multiple file types. LIBREF= option: Maps a libref to the caslib. The libref must follow standard naming conventions. It's common to make the libref and caslib names the same for clarity. Why Map a Libref to a Caslib? Mapping a libref to a caslib serves two critical purposes: The mapped libref is visible in the Libraries pane, showing the caslib and any available in-memory tables. The libref allows referencing in-memory tables in traditional SAS DATA steps and procedures as libref.tablename. For example, let's create a caslib for the data source files in the OrdersData folder. Without the LIBREF= option, the caslib ordersCaslib is created successfully but it is not visible in the Libraries pane. CASLIB ordersCaslib PATH="/home/student/Courses/PGVY/OrdersData"; Mapping the LIBREF= option displays the libref connected to the caslib in the Libraries pane. You can recognize it is a libref mapped to a caslib by its cloud icon rather than the file cabinet icon. The example below creates the ordCas caslib and assigns the libref ordCas. CASLIB ordCas PATH="/home/student/Courses/PGVY/OrdersData" LIBREF=ordCas; Mapping a Libref to a Predefined Caslib Predefined caslibs are often set up by a SAS administrator. To use a predefined caslib that is not visible in the Libraries pane, map a libref to the caslib with the LIBNAME statement. LIBNAME libref CAS CASLIB=caslibName; Libref: Must follow SAS naming conventions and should be the same or representative of the caslibName. Engine: Always CAS for caslibs. CASLIB= option: Specifies the existing caslib. Caslib names can be longer than librefs, and may contain spaces. If the caslibName contains a space, add quotation marks around it: LIBNAME libref CAS CASLIB= 'caslib Name With Spaces'; To view all predefined caslibs, run: CASLIB _all_ list; One predefined caslib in my environment is ModelPerformanceData. To create a libref named mpd to that caslib, run: LIBNAME mpd CAS CASLIB=ModelPerformanceData; The libref mpd mapped to the caslib ModelPerformanceData now appears in the libraries pane: Caslib Attributes Caslibs have attributes that describe their data connection and user access. The three main attributes are Local, Active and Personal. Local Local=Yes: A local caslib is “session scope”. If the caslib is created in SAS Studio, it is not visible in another application like SAS Visual Analytics. When the CAS session ends, the caslib is deleted. Session scope is useful when working with data that is not shared across sessions. Traditional SAS comparison: Traditional SAS libraries are deleted at the end of a SAS session. You must run a LIBNAME statement to work with your data again. Local=No: If a caslib is not local, it is “global scope”. The caslib is available to anyone with permission to access it, and the caslib is visible across applications. When the CAS session ends, the caslib is not deleted. Global scope is useful when sharing data across sessions, with other users and when working with large data that you do not want to load and unload from memory often. Traditional SAS comparison: Traditional SAS has predefined libraries such as SASHELP that persist across SAS sessions. Personal Personal=Yes: The caslib is only available to you. Personal=No: The caslib is available to other users. Active In traditional SAS, the work library is the default when a library is not specified in our code. The CAS version of the work library is the “active” caslib. The active caslib can change. It is typically casuser by default, however, if you have just created a caslib, that new caslib will become the active caslib. It is best practice to specify the caslib you are working with. Active=Yes: The caslib is the current default caslib if no other caslib is specified. Active=No: The caslib is available to use but the caslib name must be specified to use it. To view the casuser caslib attributes, use the CASLIB statement. CASLIB casuser list; Casuser is a global scope caslib, it is currently the active/ default caslib if no other caslib is specified and it is personal meaning it is available only to me. Creating the new caslib ordCas will change the active caslib to ordCas. It is also session scope, and it is not a personal caslib. CASLIB ordCas PATH="/home/student/Courses/PGVY/OrdersData" LIBREF=ordCas; CASLIB ordCas list; Run the following again. Notice that Active = No. CASLIB casuser list; Files vs. Tables in Caslibs So far, we have learned that when working with our data in CAS, we must start a CAS session. Then, create a caslib and assign a library reference so it is visible in the Libraries pane and can be referenced in a program as libref.tablename. Finally, we explored the Local, Personal and Active attributes of caslibs. Let’s talk more about data connections in a caslib. Caslibs can connect to a variety of data sources such as data in the cloud, databases, folder paths and streaming data. Traditional SAS libraries have one main component, the connection to the data source file. Caslibs have three main components: The connection to the data source files. The in-memory portion for data that has been loaded into memory. Access controls to define permissions to a specific caslib. Use the CASUTIL procedure to view data source files and in-memory tables in a caslib. PROC CASUTIL <INCASLIB="caslib-name">; LIST FILES | TABLES; QUIT; INCASLIB= is optional, however if a caslib is not specified, the active caslib will be assumed. This is one instance where I recommend being explicit with the caslib name since the active caslib can change. Let’s look at an example. Remember, we mapped a libref to the caslib ordCas. The folder OrdersData has six files, and a variety of file types. CASLIB ordCas PATH="/home/student/Courses/PGVY/OrdersData" LIBREF=ordCas; When looking at the ordCas libref in the Libraries pane, the library appears empty: This is because only in-memory tables are visible in a library mapped to a caslib in the Libraries pane. The following step displays the data source files available in the ordCas caslib: PROC CASUTIL INCASLIB="ordCas"; LIST files; QUIT; The caslib attributes and the six data source files in the caslib are listed. To see in memory tables in the caslib, run the following: PROC CASUTIL INCASLIB="ordCas"; LIST tables; QUIT; Tables have not been loaded into memory, so only the caslib attributes are visible. The log contains the follow message: The next step is to load data source files into memory. There are multiple methods for loading data into memory. In this post I am going to keep my explanation simple and use the familiar CASUTIL procedure to load a data source file into memory: PROC CASUTIL; LOAD CASDATA="orders.csv" INCASLIB="ordCas" CASOUT="Orders" OUTCASLIB="ordCas"; QUIT; CASDATA="orders.csv" defines the data source file to load into memory. INCASLIB="ordcas" defines the caslib the data source file is currently in. CASOUT="Orders" defines the output CAS table name. OUTCASLIB="ordCas" defines the caslib the in-memory table will be in. After running the step, the log displays the following messages: In the Libraries pane, the in-memory table Orders in the ordCas caslib is visible. Notice in-memory tables are marked with a lightning bolt: The following step lists tables for the ordCas caslib. Notice Orders is listed: PROC CASUTIL INCASLIB="ordcas"; LIST tables; QUIT; I can use libref.tablename in my code to work with this in-memory table. DATA ordCas.ordersAustralia; SET ordCas.orders; WHERE Country="Australia"; RUN; This data step created an in-memory table in ordCas named ordersAustralia, using the input table Orders in the ordCas caslib. I filtered for orders placed by customers in Australia. The DATA step ran in CAS and returned 60,320 observations. This data step ran in CAS because both tables were in a caslib, and everything in the data step was “CAS enabled” (certain syntax is not allowed in CAS- this DATA step contained all valid syntax). When the program executed, the Compute Server saw valid syntax for CAS and sent it over to the CAS Server. The data was divided to multiple worker nodes to complete processing. As they finished processing, data was returned to the controller node where the table was reassembled. Then it was presented to us back in SAS Studio to view. In Summary To work with data on the CAS Server: Start a CAS session. Create caslibs and map librefs to view them in the Libraries pane and reference them in code. Use the CASUTIL procedure to list files or tables in a caslib and load data into memory. Stay tuned for the next post in this series as we continue to simplify SAS Viya together! Simplifying SAS Viya Part 1: Choose Your Server Find more articles from SAS Global Enablement and Learning here.

CarleighJoC · ‎11-13-2024

Great blog!

CarleighJoC · ‎10-28-2024

Welcome to the first entry in my “Simplifying SAS Viya” series. In this post, we’ll focus on the servers available in SAS Viya, breaking down their core capabilities and explaining how they differ from traditional SAS 9. What Is SAS Viya? SAS Viya is a modern, cloud-enabled analytics platform that offers high-speed processing and scalability. It’s often described with terms like “in-memory,” “massively parallel processing,” and “cloud-native architecture.” Throughout this post I will explain and give meaning to these terms. Traditional SAS 9: Background Traditionally, SAS was installed on a single system with dedicated disk, RAM, and CPUs. Let’s walk through an example of what happens when a program runs in SAS 9: DATA work.cars_SUV; SET sashelp.cars; WHERE Type= "SUV"; RUN; PROC MEANS data= work.cars_SUV; VAR MPG_Highway; RUN; PROC FREQ data= work.cars_SUV; TABLES Make; RUN; When this program runs in SAS 9: The DATA step loads data from disk into memory, processes it, and writes it back to disk. The PROC MEANS step loads the data from disk again, generates results, and clears the memory. Finally, the PROC FREQ step also loads the data from disk, generates results, and clears the memory. In this small program, the data is loaded and unloaded from memory three times, leading to repeated input/output (I/O) overhead. This is manageable for small programs but becomes time-consuming when working with large datasets or complex processes. The SAS Viya Difference: A Two-Server Model In SAS Viya, there are two primary servers, providing options to programmers depending on the processing demands of their programs. The two servers are: SAS Compute Server CAS (Cloud Analytic Services) Server SAS Compute Server The Compute Server works similarly to the traditional SAS 9 Server, which could have been referred to as PC SAS or the SAS Workspace Server. You can run your existing SAS programs with normal SAS code in SAS Viya and it will execute on the Compute Server. It’s a great option for smaller datasets or if you want to transition existing code into SAS Viya with minimal modifications. On the Compute Server, data is read into memory and cleared from memory with each step (as explained in Traditional SAS 9: Background). Some processes are single-threaded (completing one task at a time), others are multi-threaded (tasks are being divided and worked on at the same time). Threads Explained When we talk about threads we are working with a CPU. CPUs have cores which are the physical processing units that execute instructions independently. Threads are the virtual sequences of instructions that can run on a single core. Single-threaded processes execute instructions one at a time. This can be less efficient for tasks that can be parallelized or require waiting for I/O operations. Multi-threaded processes allow multiple parts of a program to run concurrently. So, different threads can execute different tasks simultaneously, which is more efficient. Multi-threading allows memory and files to be shared, and it can remain responsive even if part of it is blocked or performing a lengthy operation. CAS Server The CAS Server is fully multi-threaded, which is what allows programs to run faster, as tasks are being distributed. The real power of SAS Viya comes from the CAS Server, which offers several advantages over traditional SAS processing: Massively Parallel Processing (MPP) Environment: CAS divides large datasets into chunks and processes them across multiple nodes simultaneously. This is like having several security lanes open at an airport, speeding up the entire process. In-Memory Analytics Engine: Instead of loading and unloading data multiple times like in SAS 9, CAS allows you to load a dataset into memory once and keep it there until your work is finished. This greatly reduces I/O, making processing faster and more efficient. Why Does Server Choice in SAS Viya Matter? Understanding the SAS Viya’s servers is key to knowing when and how to leverage the capabilities of each processing environment. The CAS Server shines when you’re working with: Large datasets (over 50GB) Programs that require multiple reads of the same data Long-running or computationally intensive tasks By understanding when to use the Compute Server versus the CAS Server, you can optimize your code for better performance. Remember, the choice to enhance your programs to run in CAS is yours. You can still run traditional SAS code as-is using the Compute Server. Quick Intro to Kubernetes SAS Viya is deployed to a Kubernetes cluster. Typically, this is something only an admin needs to know about. However, as we learn how the CAS Server works, we refer to a controller and worker nodes, which are part of a Kubernetes cluster, so here is my very simplified explanation. A Kubernetes cluster is a set of node machines for running containerized applications. It consists of a control plane which is the “brain” of the cluster and nodes which are the worker machines. Among other things, the control plane or “controller node” has the ability to send tasks to the worker nodes. How CAS Works The CAS Server is configured to run on one or more machines, each with multiple nodes. Typically, you will have one controller node and several worker nodes. When using SAS Studio in SAS Viya, the Compute Server acts as the client to the CAS Server. For data to be processed in CAS, it must be loaded into memory from a physical location. We use caslibs to connect to data source files and load tables into memory for processing in CAS. When a program is executed in CAS, the controller node distributes in-memory data in blocks to the multiple worker nodes. The workers execute the same actions at the same time on different blocks of data, which is called parallel processing. As the worker nodes finish processing, they send the data back to the controller, and we see the results back in SAS Studio as expected. Data that is in memory will stay in memory until it is explicitly dropped or the CAS session ends. While your data is in memory, the same table can be accessed by multiple users, avoiding additional I/O. If you want to save changes made to in-memory tables, you must explicitly save them back to a physical storage location. Conclusion In summary, SAS Viya allows you to submit traditional SAS code on the Compute Server, just as you always have. It also introduces a more powerful, flexible architecture than traditional SAS 9 through its CAS Server and in-memory, massively parallel processing capabilities. While the Compute Server offers compatibility with your existing SAS programs, the CAS Server is the game-changer for handling larger data and more demanding computations. In Part 2 of the series, we’ll explore how to connect and work with your data in SAS Viya using caslibs. Find more articles from SAS Global Enablement and Learning here.

CarleighJoC · ‎09-16-2024

In SAS, column length is critical. SAS numeric columns have a default length of 8 bytes, allowing the storage of up to 16 digits. In contrast, SAS character columns can vary from 1 to 32,767 bytes, with 1 byte typically equating to 1 character. *Note: This applies to single-byte encoding. If using UTF-8 encoding, some characters may require up to 3 bytes. This variability is common with non-English characters. While numeric columns generally don't need length adjustments, character columns often do to prevent truncation of values. The usual approach to altering column length involves the LENGTH statement in the DATA step. However, this method can disrupt the order of your columns. Why does this happen, and how can we fix it while maintaining the original column order? The issue stems from the Program Data Vector (PDV), but don't worry—we can resolve it using PROC SQL. What is the PDV? The PDV (Program Data Vector) is a memory area that includes each column referenced in the DATA step along with its attributes, such as name, type, and length. The PDV is created during the compilation phase of DATA step processing, which I like to call the "column" phase. During this phase, SAS scans your DATA step and assigns columns and their attributes. SAS also establishes "rules" for the PDV based on the statements and options used, like the WHERE statement (which determines which rows to read) or the DROP statement (which excludes specific columns after processing). Columns and their attributes are brought into the PDV in the order they appear in the DATA step code. When a SET statement is encountered, columns from the input table are added to the PDV in the same order. Jump to PDV Example Adjusting Column Length Let's consider an example using a subset of the SASHELP.CARS table. Below is the code used to create the WORK.SH_CARS table and snapshots of the table and column attributes: data work.sh_cars; set sashelp.cars; drop DriveTrain--Length; where Make in ("GMC" "Lexus"); run; proc contents data=work.sh_cars varnum; run; *Note: By default, the CONTENTS procedure lists variables alphabetically. VARNUM prints a list of the variable names in the order of their logical position in the table. Suppose you want to write out the value "SUV" in the Type column as "Sport Utility Vehicle". Currently, Type has a length of 8 characters. What happens when you use an IF/THEN statement to update this value? data work.sh_cars_truncated; set work.sh_cars; if Type="SUV" then Type="Sport Utility Vehicle"; run; You'll notice the value is truncated after 8 characters, resulting in "Sport Ut". This occurs because the column length remains 8. Your first thought might be to add a LENGTH statement to correct this. However, placing the LENGTH statement after the SET statement will not prevent truncation and will produce a warning in the log: data work.sh_cars_afterSet; set work.sh_cars; length Type $ 25; if Type="SUV" then Type="Sport Utility Vehicle"; run; This happens because SAS first sees the SET statement, which brings the columns and their attributes into the PDV. Since Type is initially assigned with a length of 8, the subsequent LENGTH statement can't alter it. SAS suggests moving the LENGTH statement before the SET statement: data work.sh_cars_beforeSet; length Type $ 25; set work.sh_cars; if Type="SUV" then Type="Sport Utility Vehicle"; run; proc contents data=work.sh_cars_beforeSet varnum; run; Now, Type correctly has a length of 25 characters, but it appears first in the table, disrupting the original column order. The Big Question: How Can We Maintain Column Order While Changing Column Length? To maintain column order, we'll use PROC SQL dictionary tables and macro variables. Dictionary tables are special read-only PROC SQL tables or views that provide information about all SAS libraries, tables, system options, and external files associated with the current SAS session. In our example, we will explore the DICTIONARY.COLUMNS table, which includes details like column names, types, lengths, and formats for all tables known to the SAS session. The DESCRIBE TABLE clause displays column names and labels in the SAS log. The SELECT clause will select all columns from DICTIONARY.COLUMNS. We'll limit our table to include only column information for the WORK.SH_CARS table using the WHERE clause. The LIBNAME and MEMNAME values are case-sensitive and must be written in all capital letters. The following code retrieves and displays the column labels from the DICTIONARY.COLUMNS table: proc sql; describe table dictionary.columns; select * from dictionary.columns where libname="WORK" and memname="SH_CARS"; quit; I've included partial results because we're particularly interested in the Name column. By selecting only Name, we obtain a list of column names in the original order: proc sql; select name from dictionary.columns where libname="SASHELP" and memname="CLASS"; quit; Using the INTO clause, we can store the values in the Name column in a macro variable colNames, separating them with a single space. The %PUT statement prints the macro variable text in the log: proc sql; select name into :colNames separated by " " from dictionary.columns where libname="WORK" and memname="SH_CARS"; quit; %put &colNames; Now, we have a macro variable containing our column names in the correct order. In the DATA step, we can use the RETAIN statement with the macro variable, ensuring the column names remain in order. SAS will then process the LENGTH statement to assign the type as character and length as 25 bytes for the Type column. Then SAS will bring in the rest of the column attributes from the input table on the SET statement. data work.sh_cars_orderPreserved; retain &colNames.; length Type $ 25; set work.sh_cars; if Type="SUV" then Type="Sport Utility Vehicle"; run; proc contents data=work.sh_cars_orderPreserved varnum; run; This approach allows us to change the length of a column while maintaining the original order of columns. Example: The PDV To visualize the PDV, we’ll use the SASHELP.FISH table, which has the following columns and column attributes: Next, we’ll modify the table using the following DATA step: data work.fishLength; set sashelp.fish; where Species = "Parkki"; drop Length1--Width; AvgLength=round(mean(of Length1-Length3), .1); run; When this DATA step is executed, SAS creates the PDV for the WORK.FISHLENGTH table. All columns and attributes from the input table specified in the SET statement are brought into the PDV. Additionally, SAS adds the AvgLength column that we're creating. The column type and length for AvgLength are determined based on the numeric columns used in its calculation, which default to a length of 8. SAS also processes the WHERE and DROP statements as rules. The WHERE statement ensures that only rows where Species is equal to "Parkki" are included, and after all data manipulations, the DROP statement removes the columns from Length1 to Width. The following is a mock-up of the PDV, showing column attributes and rules with their initialized values set to missing: After the DATA step is executed, we will be left with the following table: Jump to Adjusting Column Length Find more articles from SAS Global Enablement and Learning here.

Online Status	Offline
Date Last Visited	11 hours ago

Re: Data uploads in SAS Viya for Learners 4: Let me count the ways!

Simplifying SAS Viya Part 2: What are Caslibs?

Re: SAS Viya High Throughput Batch Processing: Part 1 – Reusable Batch...

Simplifying SAS Viya Part 1: Choose Your Server

Maintain Column Order While Changing Column Length in SAS

Detailing your Data in SAS Studio Part 1: Identifying the Issues

Data uploads in SAS Viya for Learners 4: Let me count the ways!

SAS Viya High Throughput Batch Processing: Part 1 – Reusable Batch Ser...

Tricks for SAS Visual Analytics Report Builders: Step 3 - Focus on Wha...

Simplifying SAS Viya Part 2: What are Caslibs?

Simplifying SAS Viya Part 1: Choose Your Server

Maintain Column Order While Changing Column Length in SAS

Re: Data uploads in SAS Viya for Learners 4: Let me count the ways!

Simplifying SAS Viya Part 2: What are Caslibs?

Re: SAS Viya High Throughput Batch Processing: Part 1 – Reusable Batch...

Simplifying SAS Viya Part 1: Choose Your Server

Maintain Column Order While Changing Column Length in SAS

SAS Viya Copilot Private Preview