BookmarkSubscribeRSS Feed

Unlock 2200+ datasets in SAS Viya for Learners 4 using the rdatasets Python Package

Started ‎06-13-2024 by
Modified ‎06-13-2024 by
Views 155

In case you haven’t heard: all the cool kids are now using SAS Viya for Learners 4 (VFL4) for teaching and learning.

 

SAS Viya for Learners 3.5 had a great run as an integrated code/low/no code experience for academics. But SAS Viya for Learners 4 will provide academics with the latest-and-greatest version of our SAS Viya platform. And if you want a full list of all the reasons that I think you should switch, today, please see my earlier SAS Communities article, found here.

 

One exciting development mentioned in my previous article was found under the Jupyter Notebook section. And for those of you who didn’t memorize that article, the section read: Leverage updated Python + R packages, including the rdataset package in Python – which provides over 2000+ datasets for teaching and learning!

 

This SAS Communities article shows exactly how to leverage those data sets.

 

I’ll start with the “what”. What is rdatasets? Well, it’s a user-created package that compiles datasets from a large number of commonly used R packages into a single Python package. Yes, this is the beauty of open-source collaboration: someone spent a LOT of their free time creating something incredibly valuable for the broader learning community. And we thank you, Vincent! More details on the package can be found here: https://pypi.org/project/rdatasets/

 

User-community generated content gratitude aside, here is the GitHub repo that contains the Python notebook we’ll use in this demonstration. Our three-part quest:

  • Examine all the datasets available in rdatasets
  • Load an interesting dataset
  • Convert that python dataframe to a CAS table

Let’s get started!

 

Part 1: Examine all the datasets available in rdatasets

 

  • Once you’ve landed in Jupyter in VFL4, you should see the following screen:
LGroves_0-1718297611692.png

 

  • Within the Launcher, open a new Python notebook, by clicking here:
LGroves_1-1718297611695.png

 

  • A blank Notebook awaits!
LGroves_2-1718297611697.png

 

  • Personally, I like to annotate a notebook, so I can figure out what the heck I did a few months from now.  So, I’ll provide a little bit of a preamble to help with the storytelling:
LGroves_3-1718297611703.png

 

  • You do you, when it comes to documentation.  I’ll also save the file… because it’s good practice. Know that this file is only saved temporary in this impermanent Jupyter environment.  More about that distinction – and why it is important – can be found in this post. And if you want to learn how to push files to GitHub, please see my other SAS Communities Article, found here.
  • Like any good open-source journey, let’s start by importing packages in our Jupyter Notebook. Well use rdatasets and pandas. Type the following into a cell:
# Import the required packages.
import rdatasets
import pandas as pd
  • Next, let’s examine the list of datasets contained in rdatasets with this command:
rdatasets.summary()
  • A look at the submitted notebook thus far, with a little added storytelling:
LGroves_4-1718297611710.png

 

Part 2: Load an interesting dataset

  • Now it’s time to upload an interesting dataset. Let’s access Affairs. Not because I’m into modeling that sort of thing, simply because it comes first. And it will be easier to find in SAS Viya later, because it starts with A.
  • The code to prepare that data:
# Get the data ready to load
from rdatasets import data
  • Next let’s load the Affairs data:
# Load the "Affairs" dataset from the "AER" package
affairs_data = data(package='AER', item='Affairs')
  • Finally, let’s check out a sample of the data to ensure it’s what we want:
# Let's check out a sample of the data
print(affairs_data.head()) # Print the first few rows of the dataset
  • Some pretty, submitted code:
LGroves_5-1718297611714.png

 

Part 3: Convert that python dataframe to a CAS table

  • Moving right along! In Part 3, we’ll define the CAS connections, so that we can push this dataframe into the SAS Viya environment. We’ll start by importing some more packages:
# Load some SAS Packages so that we can access the CAS engine in SAS Viya
import os,swat
  • Now let’s set up some of the access rules, because SAS Viya loves that sort of thing. These rules should work in all VFL4 environments:
# Setup the access rules
conn = swat.CAS(os.environ['CAS_CONTROLLER'], 5570, password=os.environ['ACCESS_TOKEN'])
  • Our next feat is to push the affairs data from to the Public Folder in CAS. This will allow us access in useful tools like SAS Visual Analytics and SAS Model Studio:
# Push Affairs Data to Public CAS to use in SAS VA + SAS Model Studio (note ==> you can't overwrite the file if it already exists)
cas_table = conn.upload_frame(affairs_data, casout=dict(name='affairs_data', caslib='public', promote = 'true'))
  • Two more steps to go! The next item on the agenda is to save the file in CAS so that it persists after the session ends. That is, of course, if you’d like to see the file again in your CAS folder. And it can be done with these lines of code:
# Push Affairs Data to CAS in you CASUSER folder (to use in SAS Studio)
cas_table = conn.upload_frame(affairs_data, casout=dict(name='affairs_data', replace=True))
  • Finally, let’s just peak into our CASUSER folder, to ensure that everything worked as planned:
# Examine the tables in casuser... because why not?
conn.tableInfo(caslib = 'casuser')
  • And all that output, taken as Part 3 together:
LGroves_6-1718297611722.png

 

  • Yay – progress! The last step is to see if the data really is available in SAS Viya. Go back into your SAS Viya for Learners 4 environment and select Explore and Visualize from the Applications menu:
LGroves_7-1718297611726.png

 

  • This opens SAS Visual Analytics. Next, select New report – and watch a new report get created!
LGroves_8-1718297611731.png

 

  • Our last item is to Add data to this new report in SAS Visual Analytics:
LGroves_9-1718297611735.png

 

  • Within the Choose Data window that pops up, scroll just a little bit and you should see AFFAIRS_DATA. Click that file and then select Add to get it moved into your report:
LGroves_10-1718297611744.png

 

  • Now you’ve unlocked the full potential of SAS with 2200+ data sets for teaching and learning. Yay!

 

Happy hackin’!

Version history
Last update:
‎06-13-2024 01:07 PM
Updated by:
Contributors

Ready to join fellow brilliant minds for the SAS Hackathon?

Build your skills. Make connections. Enjoy creative freedom. Maybe change the world. Registration is now open through August 30th. Visit the SAS Hackathon homepage.

Register today!

Free course: Data Literacy Essentials

Data Literacy is for all, even absolute beginners. Jump on board with this free e-learning  and boost your career prospects.

Get Started

Article Tags