BookmarkSubscribeRSS Feed

Fetching Sorted Data using Python

Started ‎04-13-2016 by
Modified ‎04-16-2019 by
Views 4,155

Fetching Sorted Data using Python

 

Because of the nature of distributed data on a grid of computers, data isn't always organized in a way that is most useful. In many cases, you want the data to be displayed in an ordered form. To do this, you'll want to use the sortby= parameter of the fetch action. There is also a Pandas DataFrame-like way of fetching data in an ordered frorm.

 

We first need to start with a CAS connection.

 

In[1]: import swat

In[2]: conn = swat.CAS(host, port, username, password)

We need some data to work with, so we'll upload a small data set.

 

In[3]: cars = conn.read_csv('https://raw.githubusercontent.com/sassoftware/sas-viya-programming/master/data/cars.csv')

In[4]: cars
Out[4]: CASTable('_T_FBA50B50_7F787405E0C0', caslib='CASUSERHDFS(kesmit)')

Using the fetch action, we can bring down a sample of the data.

 

In[5]: cars.table.fetch(to=5)

Screen Shot 2016-08-12 at 2.02.58 PM.png

 

To bring the data down in a sorted form, we use the sortby= parameter. This parameter can take a list of column names, or a list of dictionaries with the keys 'name', 'order', and 'formatted'. The 'name' key is required. The 'order' parameter can be set to 'ascending' or 'descending'. The 'formatted' parameter can be set to 'raw' or 'formatted' to sort the column based on the formatted value or the raw value; by default, it sorts by raw value. Let's start with just using column names.

 

In[6]: cars.fetch(to=5, sortby=['Cylinders', 'EngineSize'])

Screen Shot 2016-08-12 at 2.04.14 PM.png

 

To reverse the order of the EngineSize column, we need to use a dictionary for that element.

 

In[7]: cars.fetch(to=5, sortby=['Cylinders', dict(name='EngineSize', order='descending')])

Screen Shot 2016-08-12 at 2.05.06 PM.png

 

DataFrame-style Sorting

 

As with most elements of the CASTable object, you can also apply sorting behaviors just like you do with Pandas DataFrames. After you do this, each time the data is fetched from the server, it will automatically apply the sorting options for you. This gives the appearance of a sorted CASTable object.

 

The CASTable object has a sort_values method (it also supports the older sort method name). The sort_values method takes a list of column names as well as optional ascending= and inplace= options. The ascending= option takes a boolean (for all columns) or a list of booleans (one for each column) to indicate whether it should be sorted in ascending or descending order. The inplace= option indicates whether the sort options should be applied to the CASTable that sort_value is called on, or if it should return a copy of the CASTable object with the sort options applied. Note that no copying or sorting is done when the sort_value option is used. The sorting only occurs when data is being brought back to the client side either through table.fetch directly, or through any of the other methods that use fetch in the background (e.g., headtail)

 

Let's apply some sort options to our cars object.

 

In[8]: sorted_cars = cars.sort_values(['Cylinders', 'EngineSize'])
In[9]: sorted_cars.fetch(to=5)

Screen Shot 2016-08-12 at 2.06.10 PM.png

 

Now let's fetch some data using the head DataFrame method. You'll see that the data is still coming back in sorted order.

 

In[10]: sorted_cars.head()

Screen Shot 2016-08-12 at 2.06.48 PM.png

 

Since we didn't use the inplace= option, the original CASTable object should still bring data back in an unsorted order.

 

In[11]: cars.head()

Screen Shot 2016-08-12 at 2.07.31 PM.png

 

Now let's try applying a sort order along with the inplace=True option. This will modify the CASTable object directly rather than returning a copy.

 

In[12]: cars.sort_values(['Cylinders', 'EngineSize'], ascending=[True, False], inplace=True)
In[13]: cars.head()

Screen Shot 2016-08-12 at 2.08.11 PM.png

 

In[14]: conn.close()

 

Conclusion

 

Dealing with distributed data can sometimes take some getting used to, but hopefully with these tips on how to sort your data when it is brought back to the client, you can retrieve your data in the form that you are looking for.

 

Resources

 

You can download the Jupyter notebook version of this article at https://github.com/sassoftware/sas-viya-programming/tree/master/communities.

Version history
Last update:
‎04-16-2019 09:09 AM
Updated by:
Contributors

Ready to join fellow brilliant minds for the SAS Hackathon?

Build your skills. Make connections. Enjoy creative freedom. Maybe change the world. Registration is now open through August 30th. Visit the SAS Hackathon homepage.

Register today!

Free course: Data Literacy Essentials

Data Literacy is for all, even absolute beginners. Jump on board with this free e-learning  and boost your career prospects.

Get Started

Article Tags