BASE SAS is the foundational programming language and environment for SAS software. It's been around for decades, and it's a real workhorse when it comes to data processing, statistical analysis, and reporting.
Whereas, CASL(Cloud Analytic Services Language) is one such innovation, offering a modern approach to data analytics. It is a cloud-native programming language designed specifically for the SAS Viya platform.
In this article, we will perform a comparative analysis of these two languages, highlighting their strengths, capabilities, and suitability for various data analytics tasks through demonstrating simple practical examples.
BASE SAS
It's the fundamental software in the SAS suite, providing essential tools for managing, analysing, and reporting data. With BASE SAS, users can manipulate and transform data, conduct statistical analyses, and generate reports to extract valuable insights using its rich set of BASE SAS procedures.
Although BASE SAS is a robust and versatile tool, it may have limitations when dealing with large datasets or real-time processing requirements.
CASL (Cloud Analytic Services Language)
CASL, on the other hand, represents a modern approach to data analytics, leveraging parallel and distributed computing for faster processing. CASL is all about leveraging distributed computing and in-memory processing to handle large datasets and deliver real-time insights at lightning speed.
It complements traditional analytics tools like BASE SAS and enables programmers to handle big data and optimise complex analytics workflows.
CASL is certainly not the only way to instruct CAS but it’s definitely a powerful option. Stephen Foerster aptly described CASL as it's like BASE SAS with the MACRO built-in.
BASE SAS vs CASL: Comparative Analysis
This analysis helps you understand the nuances of BASE SAS and CASL, empowering you to choose the right toolset based on your specific analytical needs and infrastructure requirements. This comparative analysis serves as a guide to navigate the strengths and capabilities of both, finding the optimal solution to meet diverse analytical challenges and business objectives.
Language Structure:
BASE SAS: Combines DATA step with SAS Procedures.
CASL: Statement-based scripting language that is case insensitive and executes CAS actions.
Processing Engine:
BASE SAS: Runs on traditional SAS server.
CASL: Interacts with SAS Cloud Analytic Services (CAS), enabling distributed computing.
Data Handling:
BASE SAS: Processes data sequentially.
CASL: Supports in-memory processing, allowing for faster handling of big data.
Conditional Logic:
BASE SAS: Requires SAS/MACRO for complex conditional logic.
CASL: Has built-in conditional logic capabilities, similar to having MACRO functionality integrated.
Procedure Execution:
BASE SAS: Uses PROC statements directly.
CASL: Uses CAS actions instead of procedures, though these actions often correspond to CAS-enabled PROCs.
Data Access:
BASE SAS: Primarily works with local data and data stored in SAS datasets on the server. It is well-suited for traditional data processing and analysis tasks.
CASL: Designed to access and manipulate data in CAS tables, which allows for distributed data processing.
Scalability:
BASE SAS: Limited by single-machine processing.
CASL: Designed for scalable, distributed computing environments.
Language Integration:
BASE SAS: Primarily SAS-centric.
CASL: Can be accessed via multiple interfaces including SAS, Python, R, Java, and REST APIs.
Analytics Lifecycle Support:
BASE SAS: Supports traditional analytics workflows.
CASL: Designed to support the entire analytical lifecycle, including data management, analytics, and scoring.
Performance:
BASE SAS: Efficient for traditional data processing tasks.
CASL: Optimized for high-performance analytics, especially with large datasets.
Code Generation:
BASE SAS: Often requires macro programming for dynamic code generation.
CASL: Offers more flexible options for dynamic code generation and execution.
In summary, while BASE SAS and CASL share similarities in basic syntax, CASL is specifically designed for the cloud-based, distributed computing environment of SAS Viya. It offers enhanced performance for large datasets, built-in conditional logic, and greater flexibility in terms of language integration and analytics lifecycle support.
BASE SAS vs CASL: Code Comparison and Transition
This side-by-side code comparison highlights the differences in functionality and usage between BASE SAS and CASL, showcasing their unique strengths and applications. The given examples demonstrate both approaches to performing the same task but in different ways.
We've covered a total of six different examples to showcase the differences. I highly recommend reviewing all the examples and their outputs, but feel free to focus on the one that interests you the most.
Here's a quick overview:
SAS vs. CASL #1: Import External Files
Loading a CSV File in BASE SAS
Loading a CSV File in CASL
SAS vs. CASL #2: Load Datasets
Loading SAS Datasets into the SAS Work Library
Loading SAS Datasets into the CAS Library
SAS vs. CASL #3: Print Sample Data Values
Printing Sample Data Values in BASE SAS
Printing Sample Data Values in CASL
SAS vs. CASL #4: Display Dataset Summary
Displaying a Summary of Dataset Contents in BASE SAS
Displaying a Summary of Dataset Contents in CASL
SAS vs. CASL #5: Data Handling (Filtering, Grouping, and Sorting)
Filtering, Grouping, and Sorting Variables in BASE SAS
Filtering, Grouping, and Sorting Variables in CASL
SAS vs. CASL #6: Generate Descriptive Statistics
Calculating Descriptive Statistics in BASE SAS
Calculating Descriptive Statistics in CASL
SAS vs. CASL #1: Import External Files
Loading a CSV File in SAS
The following code demonstrates how to load a CSV file directly into SAS. It begins by defining a file reference (reffile) pointing to the CSV file located at the specified path. The proc import procedure is then used to read the CSV file.
The datafile parameter specifies the file to be imported, dbms=csv indicates that the file format is CSV, and out=work.hmeq_imported specifies the output dataset's name and location in the work library. The getnames=Yes option tells SAS to use the first row of the CSV file as the variable names.
/* load file directly into sas */
filename reffile "/pb/Users/MayurJadhav/Files/hmeq.csv";
proc import datafile=reffile
dbms=csv
out=work.hmeq_imported;
getnames=Yes;
run;
Output Log:
NOTE: The infile REFFILE is:
Filename=/pb/Users/MayurJadhav/Files/hmeq.csv,
Owner Name=UNKNOWN,Group Name=UNKNOWN,
Access Permission=-rw-r--r--,
Last Modified=05Jul2024:21:06:17,
File Size (bytes)=438194
NOTE: 5960 records were read from the infile REFFILE.
The minimum record length was 21.
The maximum record length was 83.
NOTE: The data set WORK.HMEQ_IMPORTED has 5960 observations and 13 variables.
NOTE: DATA statement used (Total process time):
real time 0.01 seconds
cpu time 0.01 seconds
5960 rows created in WORK.HMEQ_IMPORTED from REFFILE.
Loading a CSV File in CASL
The following code shows how to load the same CSV file into CAS (Cloud Analytic Services) in the SAS Viya environment. It starts a CAS session named casauto and then uploads the CSV file using the upload statement.
The path= parameter specifies the location of the CSV file, and the importoptions parameter indicates that the file type is CSV and that the first row contains the variable names (getNames=True). The casout parameter specifies that the imported data should be stored in a CAS table named "hmeq_in_cas", and the replace=True option ensures that any existing table with the same name will be replaced. Finally, the table.tableInfo action statement is used to display information about the CAS table.
/* load file directly into CAS */
proc cas;
session casauto;
upload path="/pb/Users/MayurJadhav/Files/hmeq.csv"
importoptions={filetype="CSV" getNames=True}
casout={
name="hmeq_in_cas"
replace=True
}
;
run;
table.tableInfo; /* shows information about a table */
run;
Output Log:
NOTE: Active Session now casauto.
NOTE: Cloud Analytic Services made the uploaded file available as table HMEQ_IN_CAS in caslib CASUSER(MayurJadhav).
NOTE: The table HMEQ_IN_CAS has been created in caslib CASUSER(MayurJadhav) from binary data uploaded to Cloud Analytic
Services.
{caslib=CASUSER(MayurJadhav),tableName=HMEQ_IN_CAS}
91
92 table.tableInfo; /* shows information about a table */
93 run;
94
Results:
SAS vs. CASL #2: Load Datasets
Loading SAS Datasets into the SAS Work Library
The below example demonstrates how to load SAS datasets from the sashelp library into the work library, effectively creating copies of the datasets for temporary use. This allows you to work with these copies without altering the original data.
We’ll continue to use the datasets stored in the WORK library throughout this article to demonstrate the distinctions between BASE SAS and CASL across various examples.
/* load sas datasets directly into sas work lib */
data work.class; set sashelp.class;
run;
data work.cars; set sashelp.cars;
run;
data work.iris; set sashelp.iris;
run;
Output Log:
80 /* load sas datasets directly into sas work lib */
81
82 data work.class; set sashelp.class;
83 run;
NOTE: There were 19 observations read from the data set SASHELP.CLASS.
NOTE: The data set WORK.CLASS has 19 observations and 5 variables.
NOTE: DATA statement used (Total process time):
real time 0.00 seconds
cpu time 0.00 seconds
84 data work.cars; set sashelp.cars;
85 run;
NOTE: There were 428 observations read from the data set SASHELP.CARS.
NOTE: The data set WORK.CARS has 428 observations and 15 variables.
NOTE: DATA statement used (Total process time):
real time 0.00 seconds
cpu time 0.00 seconds
86 data work.iris; set sashelp.iris;
87 run;
NOTE: There were 150 observations read from the data set SASHELP.IRIS.
NOTE: The data set WORK.IRIS has 150 observations and 5 variables.
NOTE: DATA statement used (Total process time):
real time 0.00 seconds
cpu time 0.00 seconds
Loading SAS Datasets into the CAS Library
Similar to the previous example where copies of datasets are created in the work library, this code loads built-in datasets into the CAS (Cloud Analytic Services) environment for processing.
It connects to the CAS session named casauto and then loads three datasets, replacing any existing versions in CAS. You can find these three datasets with the same name but under CASUSER CAS library. This allows you to work with these datasets in the CAS environment, which is designed for high-performance, in-memory processing.
/* load sas datasets directly into CAS lib */
proc casutil;
cas casauto;
load data=sashelp.class replace;
load data=sashelp.cars replace;
load data=sashelp.iris replace;
run;
Output Log:
80 /* load sas datasets directly into CAS lib */
81 proc casutil;
NOTE: The UUID '105b1d4a-836b-3448-91a7-a1b5d0da05c2' is connected using session CASAUTO.
82 cas casauto;
WARNING: A session with the name CASAUTO already exists.
83 load data=sashelp.class replace;
NOTE: SASHELP.CLASS was successfully added to the "CASUSER(MayurJadhav)" caslib as "CLASS".
84 load data=sashelp.cars replace;
NOTE: SASHELP.CARS was successfully added to the "CASUSER(MayurJadhav)" caslib as "CARS".
85 load data=sashelp.iris replace;
NOTE: SASHELP.IRIS was successfully added to the "CASUSER(MayurJadhav)" caslib as "IRIS".
86 run;
SAS vs. CASL #3: Print Sample Data Values
Printing Sample Data Values in BASE SAS
This code prints the first 10 rows of a dataset named "class" from the "work" library, which we loaded in the previous example. It shows the values for the columns listed after the VAR statement: name, sex, age, height, and weight.
/* print sample data values */
proc print data=work.class (obs=10);
var name sex age height weight;
run;
Results:
Printing Sample Data Values in CASL
This code prints the first 10 rows of a dataset named "class" in the CAS (Cloud Analytic Services) environment from the CASUSER CAS library.
It connects to the CAS session named casauto, and then uses the table.fetch action to retrieve the values for the columns: name, sex, age, height, and weight. The to=10 option specifies that only the first 10 rows should be displayed. The output of this will be printed on the “RESULTS” tab.
/* print sample data values in CASL */
proc cas;
session casauto;
table.fetch /
format=true,
fetchvars = {"name", "sex", "age", "height", "weight"},
table="class",
to=10;
run;
quit;
Results:
SAS vs. CASL #4: Display Dataset Summary
Displaying a Summary of Dataset Contents in BASE SAS
The traditional BASE SAS proc contents procedure provides detailed structural and metadata information about a dataset named "class" located in the "work" library. It includes information such as variable names (columns), their types (numeric or character), and additional attributes like format and length.
This procedure provides an in-depth overview of the dataset's layout and characteristics, focusing solely on its structure and metadata, without displaying the actual data values contained within the dataset.
/* display table contents */
proc contents data=work.class;
run;
Results:
Displaying a Summary of Dataset Contents in CASL
To get the details of CAS dataset contents you need to use several CAS table actions such as caslibInfo, columninfo, recordCount, tableDetails, etc.
Retrieve CAS Table Information:
table.caslibInfo;: Displays information about the CAS libraries available in the session.
table.columninfo / table="class";: Provides details about the columns (variables) in the "class" table, such as their names, types, and lengths.
table.recordCount / table="class";: Shows the number of records (rows) in the "class" table.
table.tableDetails / table="class";: Gives comprehensive details about the structure and attributes of the "class" table, including metadata.
table.tableInfo / table="class";: Offers general information about the "class" table, such as its name, location, and description.
/* display CAS table contents */
proc cas;
session casauto;
table.caslibInfo;
table.columninfo / table="class";
table.recordCount / table="class";
table.tableDetails / table="class";
table.tableInfo / table="class";
run;
quit;
Results:
SAS vs. CASL #5: Data Handling (Filtering, Grouping, and Sorting)
Filtering, Grouping, and Sorting Variables in BASE SAS
The following code shows the basic data handling operations such as filtering the data, grouping and sorting data based on selected variables.
This code retrieves specific columns (name, sex, age, height, weight) from the "class" dataset, filters it to include only females (sex="F"), groups the data by "name" and "age", and then sorts the grouped data in descending order by "name" and "age". It demonstrates how PROC SQL procedure are used in the data handling tasks like filtering, grouping, and sorting within SAS.
/* data handling: filtering, grouping, and sorting by variables */
proc sql outobs=10;
select name, sex, age, height, weight from work.class
where sex="F"
group by name, age
order by name desc, age desc;
quit;
Results:
Filtering, Grouping, and Sorting Variables in CASL
Similar to the proc sql procedure in traditional SAS, in CAS you can use the table.fetch action to perform a wide range of data handling operations, from basic to advanced tasks. This action allows you to retrieve data from CAS tables based on specified criteria, filter results, aggregate data, sort rows, and limit the number of returned rows, among other functionalities.
The below code connects to CAS, specifies conditions and variables to fetch from the "class" table, retrieves data where sex is female, sorts it by name and age in descending order, fetches the top 10 rows, describes the fetched result, and prints the data.
The following code snippet demonstrates its usage:
classtbl.name ="class"; specifies that the table to be queried is named "class".
classtbl.where = "sex = 'F'"; sets a condition to retrieve rows where the sex column equals 'F' (indicating females).
table.fetch result=r_var/ ... ; performs the fetch action to retrieve data and result saved in the r_var variable.
table=classtbl, specifies the table from which data is fetched.
to=10; limits the fetch to the first 10 rows.
describe r_var; provides a description of the fetched result (r_var).
print r_var; prints the fetched data (r_var).
/* data handling in CASL: filtering, grouping, and sorting by variables */
proc cas;
session casauto;
classtbl.name ="class";
classtbl.where = "sex = 'F'";
fvars = {"name", "sex", "age", "height", "weight"};
table.fetch result=r_var/ /* results of the fetch action are saved in the "r_var" variable */
format=false,
fetchvars = fvars,
index=false,
sortby={
{name="name", order="descending"},
{name="age", order="descending"}
},
table=classtbl,
to=10;
describe r_var;
print r_var;
run;
quit;
Results:
SAS vs. CASL #6: Generate Descriptive Statistics
Calculating Descriptive Statistics in BASE SAS
This code generates descriptive statistics for a dataset named "class" and organizes the results by the "sex" variable. First, it sorts the dataset by gender and then calculates descriptive statistics such as minimum, maximum, mean, standard deviation, etc for each gender group. The results are saved in a new dataset named work.summary_stats.
/* Generate descriptive statistics */
proc sort data=class out=classbysex;
by sex;
run;
proc means data=classbysex max mean min n nmiss std stderr;
by sex;
output out=summary_stats
;
run;
Results:
Calculating Descriptive Statistics in CASL
You can generate the same descriptive statistics from the CAS table using the simple.summary CAS action. It generates descriptive statistics for numeric variables such as the sample mean, sample variance, sample size, sum of squares, and more.
The following code generates descriptive statistics for a dataset named "class" in the CAS environment, organizing the results by the "sex" variable. The simple.summary action calculates various descriptive statistics mentioned after subSet= option , including maximum, mean, minimum, count, number of missing values, standard deviation, and standard error for each gender group.
/* Generate descriptive statistics in CASL*/
proc cas;
tbl1.name = "class";
tbl1.groupBy = "sex";
simple.summary /
table = tbl1
subSet = {"MAX", "MEAN", "MIN", "N", "NMISS", "STD", "STDERR"};
run;
quit;
Results:
Conclusion
In conclusion, this comparative analysis between BASE SAS and CASL provides a comprehensive exploration of their respective strengths and applications in data analytics. I hope this head-to-head comparison with demonstrated examples would help you select the appropriate toolset based on specific analytical needs and infrastructure requirements.
This article guide you how to transform your BASE SAS code into CASL but you could learn more about “When to CASL and not to CASL, SAS programming in SAS Viya”.
Whether you choose to leverage the advanced capabilities of CASL or retain some functionalities of BASE SAS, this choice will greatly impact how well you can understand and use your data to gain valuable insights.
References:
https://documentation.sas.com/doc/en/pgmsascdc/9.4_3.5/caslpg/titlepage.htm
https://support.sas.com/resources/papers/proceedings20/4454-2020.pdf
https://www.lexjansen.com/phuse-us/2022/as/PRE_AS09.pdf
https://communities.sas.com/t5/SAS-Communities-Library/CASL-It-s-like-Base-SAS-with-the-MACRO-Built-In/ta-p/651202
https://www.pharmasug.org/proceedings/2018/AD/PharmaSUG-2018-AD23.pdf
https://learnsascode.com/base-sas-vs-casl-a-simple-comparison/
https://go.documentation.sas.com/doc/en/pgmsascdc/9.4_3.5/proccas/n19orokqs3mwaen16qfcx986gmfm.htm
https://documentation.sas.com/doc/en/pgmsascdc/9.4_3.4/pgmdiff/p06ibhzb2bklaon1a86ili3wpil9.htm
... View more