The actions in CAS cover a wide variety of statistical analyses. While we can't cover all of them here, we'll at least get you started on some of the simpler ones.
First we need to get a CAS connection set up.
In[1]: import swat
In[2]: conn = swat.CAS(host, port, username, password)
simple
Action Set
The basic statistics package in CAS is called simple and should be already loaded. If you are using IPython, you can see what actions are available using the ? operator.
In[3]: conn.simple?
Type: Simple
String form: <swat.cas.actions.Simple object at 0x3cf0d90>
File: /u/kesmit/pp/swat/GTKLAXND/misc/python/swat/cas/actions.py
Definition: conn.simple(self, *args, **kwargs)
Docstring:
Analytics
Actions

simple.correlation : Generates a matrix of Pearson productmoment correlation coefficients
simple.crosstab : Performs oneway or twoway tabulations
simple.distinct : Computes the distinct number of values of the variables in the variable list
simple.freq : Generates a frequency distribution for one or more variables
simple.groupby : Builds BY groups in terms of the variable value combinations given the variables in the variable list
simple.mdsummary : Calculates multidimensional summaries of numeric variables
simple.numrows : Shows the number of rows in a Cloud Analytic Services table
simple.paracoord : Generates a parallel coordinates plot of the variables in the variable list
simple.regression : Performs a linear regression up to 3rdorder polynomials
simple.summary : Generates descriptive statistics of numeric variables such as the sample mean, sample variance, sample size, sum of squares, and so on
simple.topk : Returns the topK and bottomK distinct values of each variable included in the variable list based on a userspecified ranking order
You can also use Python's help function.
In[4]: help(conn.simple)
Help on Simple in module swat.cas.actions object:
class Simple(CASActionSet)
 Analytics

 Actions
 
 simple.correlation : Generates a matrix of Pearson productmoment correlation coefficients
 simple.crosstab : Performs oneway or twoway tabulations
 simple.distinct : Computes the distinct number of values of the variables in the variable list
 simple.freq : Generates a frequency distribution for one or more variables
 simple.groupby : Builds BY groups in terms of the variable value combinations given the variables in the variable list
 simple.mdsummary : Calculates multidimensional summaries of numeric variables
 simple.numrows : Shows the number of rows in a Cloud Analytic Services table
 simple.paracoord : Generates a parallel coordinates plot of the variables in the variable list
 simple.regression : Performs a linear regression up to 3rdorder polynomials
 simple.summary : Generates descriptive statistics of numeric variables such as the sample mean, sample variance, sample size, sum of squares, and so on
 simple.topk : Returns the topK and bottomK distinct values of each variable included in the variable list based on a userspecified ranking order

 Method resolution order:
 Simple
 CASActionSet
 builtins.object

 Data and other attributes defined here:

 actions = {'correlation': <class 'swat.cas.actions.simple.Correlation'...

 
 Methods inherited from CASActionSet:
... output truncated ...
Let's start off with the summary action. We'll need some data, so we'll load some CSV from a local file. Then we'll run the action on it.
In[5]: cars = conn.read_csv('https://raw.githubusercontent.com/sassoftware/sasviyaprogramming/master/data/cars.csv')
In[6]: out = cars.summary()
In[7]: out
The result object here is a CASResults object which is a subclass of a Python dictionary. In this case, we only have one key "Summary". The value for this key is a DataFrame. We can store the DataFrame in a variable so that it's easier to work with, then we can do any of the standard Pandas DataFrame operations on it. Here we are setting the first column as the index for the DataFrame so that we can do data selection easier later on.
In[8]: df = out['Summary']
In[9]: df.set_index(df.columns[0], inplace=True)
In[10]: df
Now that we have an index, we can use the loc property of the DataFrame to select rows based on index values as well as columns based on names.
In[11]: df.loc[['MSRP', 'Invoice'], ['Min', 'Mean', 'Max']]
In the previous example, we called the summary action directly. This gave us a CASResults object that contained a DataFrame with the result of the action. You can also use many of the Pandas DataFrame methods directly on the CASTable object so that, in many ways, they are interchangeable. One of the most common methods used on a Pandas DataFrame is the describe method. This includes statistics that would normally be gotten by running variations of the summary, distinct, topk, and percentile actions. This is all done for you and the output created is the same as what you would get from an actual Pandas DataFrame. The difference is that in the case of the CASTable version, you can handle much, much larger data sets.
In[12]: cars.describe()
Other examples of DataFrame methods that work on CASTable objects are min, max, std, etc. Each of these calls summary in the background, so if you want to use more than one, you might be better off just calling the describe method once to get all of them.
In[13]: cars.min()
Out[13]:
Make Acura
Model 3.5 RL 4dr
Type Hybrid
Origin Asia
DriveTrain All
MSRP 10280
Invoice 9875
EngineSize 1.3
Cylinders 3
Horsepower 73
MPG_City 10
MPG_Highway 12
Weight 1850
Wheelbase 89
Length 143
dtype: object
In[14]: cars.max()
Out[14]:
Make Volvo
Model Z4 convertible 3.0i 2dr
Type Wagon
Origin USA
DriveTrain Rear
MSRP 192465
Invoice 173560
EngineSize 8.3
Cylinders 12
Horsepower 500
MPG_City 60
MPG_Highway 66
Weight 7190
Wheelbase 144
Length 238
dtype: object
In[15]: cars.std()
Out[15]:
MSRP 19431.716674
Invoice 17642.117750
EngineSize 1.108595
Cylinders 1.558443
Horsepower 71.836032
MPG_City 5.238218
MPG_Highway 5.741201
Weight 758.983215
Wheelbase 8.311813
Length 14.357991
dtype: float64
Although we have just barely scratched the surface, you should now be able to get some basic statistical results back about your data. Whether you want to use the action API directly, or the familiar Pandas DataFrame methods is up to you.
You can download the Jupyter notebook version of this article at https://github.com/sassoftware/sasviyaprogramming/tree/master/communities.
Registration is open! SAS is returning to Vegas for an AI and analytics experience like no other! Whether you're an executive, manager, end user or SAS partner, SAS Innovate is designed for everyone on your team. Register for just $495 by 12/31/2023.
If you are interested in speaking, there is still time to submit a session idea. More details are posted on the website.
Data Literacy is for all, even absolute beginners. Jump on board with this free elearning and boost your career prospects.