Because of the nature of distributed data on a grid of computers, data isn't always organized in a way that is most useful. In many cases, you want the data to be displayed in an ordered form. To do this, you'll want to use the sortby= parameter of the fetch action. There is also a Pandas DataFrame-like way of fetching data in an ordered frorm.
We first need to start with a CAS connection.
In[1]: import swat
In[2]: conn = swat.CAS(host, port, username, password)
We need some data to work with, so we'll upload a small data set.
In[3]: cars = conn.read_csv('https://raw.githubusercontent.com/sassoftware/sas-viya-programming/master/data/cars.csv')
In[4]: cars
Out[4]: CASTable('_T_FBA50B50_7F787405E0C0', caslib='CASUSERHDFS(kesmit)')
Using the fetch action, we can bring down a sample of the data.
In[5]: cars.table.fetch(to=5)
To bring the data down in a sorted form, we use the sortby= parameter. This parameter can take a list of column names, or a list of dictionaries with the keys 'name', 'order', and 'formatted'. The 'name' key is required. The 'order' parameter can be set to 'ascending' or 'descending'. The 'formatted' parameter can be set to 'raw' or 'formatted' to sort the column based on the formatted value or the raw value; by default, it sorts by raw value. Let's start with just using column names.
In[6]: cars.fetch(to=5, sortby=['Cylinders', 'EngineSize'])
To reverse the order of the EngineSize column, we need to use a dictionary for that element.
In[7]: cars.fetch(to=5, sortby=['Cylinders', dict(name='EngineSize', order='descending')])
As with most elements of the CASTable object, you can also apply sorting behaviors just like you do with Pandas DataFrames. After you do this, each time the data is fetched from the server, it will automatically apply the sorting options for you. This gives the appearance of a sorted CASTable object.
The CASTable object has a sort_values method (it also supports the older sort method name). The sort_values method takes a list of column names as well as optional ascending= and inplace= options. The ascending= option takes a boolean (for all columns) or a list of booleans (one for each column) to indicate whether it should be sorted in ascending or descending order. The inplace= option indicates whether the sort options should be applied to the CASTable that sort_value is called on, or if it should return a copy of the CASTable object with the sort options applied. Note that no copying or sorting is done when the sort_value option is used. The sorting only occurs when data is being brought back to the client side either through table.fetch directly, or through any of the other methods that use fetch in the background (e.g., head, tail)
Let's apply some sort options to our cars object.
In[8]: sorted_cars = cars.sort_values(['Cylinders', 'EngineSize'])
In[9]: sorted_cars.fetch(to=5)
Now let's fetch some data using the head DataFrame method. You'll see that the data is still coming back in sorted order.
In[10]: sorted_cars.head()
Since we didn't use the inplace= option, the original CASTable object should still bring data back in an unsorted order.
In[11]: cars.head()
Now let's try applying a sort order along with the inplace=True option. This will modify the CASTable object directly rather than returning a copy.
In[12]: cars.sort_values(['Cylinders', 'EngineSize'], ascending=[True, False], inplace=True)
In[13]: cars.head()
In[14]: conn.close()
Dealing with distributed data can sometimes take some getting used to, but hopefully with these tips on how to sort your data when it is brought back to the client, you can retrieve your data in the form that you are looking for.
You can download the Jupyter notebook version of this article at https://github.com/sassoftware/sas-viya-programming/tree/master/communities.
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Data Literacy is for all, even absolute beginners. Jump on board with this free e-learning and boost your career prospects.