BookmarkSubscribeRSS Feed

Converting Python data frames to SAS datafiles | SAS Viya for Learners

Started ‎03-15-2023 by
Modified ‎03-23-2023 by
Views 3,534

Get it now: A 2-1 Special on Python!

 

I’ve shown you how to convert SAS data sets from R data frames – see https://communities.sas.com/t5/SAS-Communities-Library/Converting-R-data-frames-to-SAS-datafiles-SAS... – so what about Python? Well, we’re all about supporting our Python friends too!

 

This article will follow the same format as my R installment – except that it’s a 2-1 rather than a 3-1 special. Beyond my other hacks for handling files greater 100mb (see links at end) in SAS Viya for Learners, the approach used in this article won’t be able to use .txt files to read in large files.

 

But, hey, 2-1 is still pretty good! And here is the article overview:

  • I’ll cover how to convert your Python data frame to a SAS data file in SAS Viya for Learners
  • I’ll show you how to access datasets stored in Python packages

 

And why is that later point a big deal? Well, accessing data contained within Python packages significantly increases the availability of data for teaching and learning in SAS Viya for Learners. Yay! These datasets should be relevant – for example, data in pandas should be useful for statistical analysis, while data in seaborn should be great for visualizations – which means that public health professors no longer need to use banking data in their classes. That’s certainly something. And if you want to know which packages are available in our installation of Python, please see – again – my self-promotional links at the end of this article.

 

Sound interesting? Then let’s get started!

 

We’ll start by launching Jupyter from the main SAS Drive page:

 

LGroves_0-1678891292464.png

 

Once Jupyter launches, you’ll see a browser akin to the following:

 

LGroves_1-1678891292524.png

 

Now it’s time for Python exploration! Open a Python Notebook by clicking here:

 

LGroves_2-1678891292542.png

 

For my example, I’m going to use data from the seaborn library – which is used to create visualizations in Python. If you’re just here to learn, then please follow along. But, if you’ve already got your own Python dataset, please upload it to VFL using the approach outline in one of my earlier posts on Ways to Handle the 100mb Data Upload Restriction in SAS Viya for Learners. Again, see my references at the end.

 

To access seaborn, type the following in the first command line:

 

LGroves_3-1678891292545.png

 

And go ahead and submit that line. As a reminder there are two ways to submit lines of code in Jupyter. The first is to highlight the cell and press the old-school play button:

 

LGroves_4-1678891292550.png

 

The second is to click in the cell and press Shift+Enter.

 

Now that seaborn is loaded, let’s examine which datasets are available in the package. In the second cell, type:

 

LGroves_5-1678891292552.png

 

And submit. Yup, it’s really that easy. Now examine all the data sets available:

 

LGroves_6-1678891292562.png

 

While not as expansive as my R example, that’s still a good number of datasets to start with in the seaborn library. Again, seaborn will focus on datasets useful for visualizations.

 

Now, let’s prepare the data for export. The process is a bit more involved than our R example, but we’ve got this. To begin, let’s create a function to glimpse the data in Jupyter:

 

LGroves_7-1678891292570.png

 

Write that code in cell 3 and submit.

 

To parallel the R example, let’s examine the mpg data set. In Python we’ll want to load mpg from the seaborn data and save it as a data frame. Here is the code, which you can enter in cell 4 and then submit:

 

LGroves_8-1678891292577.png

 

We’re getting there! Before plunging ahead with the Python to SAS Data conversion, let’s peak at mpg_df. Remember that glimpse function from one line ago? Let’s use it:

 

LGroves_9-1678891292580.png

 

Put this code in cell 5 and submit. Output should appear as the following:

 

LGroves_10-1678891292619.png

 

Nice! Fun fact – this is a different file from the mpg file examined in R. But, no worries, the conversion is the important part here – not necessarily the data.

 

Now let’s move on to the key chunk of code that will convert a Python data frame to a SAS data set. And the special sauce is the saspy function. Read more about saspy here: https://sassoftware.github.io/saspy/index.html.

 

Unlike the R example, we do not need to create a .txt file and then read it in via a SAS program. The positive of this approach is that we can import files from Python to SAS a LOT more efficiently than in R. But… the downside is that we can’t manipulate that .txt file to handle data larger than 100mb (see my post on Converting R data frames to SAS datafiles | SAS Viya for Learners if that last sentence is confusing).

 

Four lines of code will get us across the finish line. The first three are all system setup calls, so let’s do them all together. We’ll start in cell 6:

 

LGroves_11-1678891292633.png

 

I’ll submit the code on my end and then walk through the output:

 

LGroves_12-1678891292685.png

 

Cell 6 loads the saspy function into Jupyter. Cell 7 establishes the SAS connection via VFL and tells Jupyter where to start communicating with SAS. Finally, Cell 8 will establish a SAS library in VFL – so that we can save the data in a particular location. For me, I’m sticking the file in my casuser folder. But, you could also place the new data in another folder within casuser.

 

Ready for the grand finale? It’s a form of this statement:

 

LGroves_13-1678891292692.png

 

While more exciting details about the dataframe2sasdata function can be found here https://sassoftware.github.io/saspy/getting-started.html#load-data-into-sas, I’ll share the essential pieces. The first argument df=mpg_df simply identifies the table to be transferred. table= specifies what the table will be called when it’s transferred into VFL. And, finally, libref= tells the function where to stick the data in VFL – which is the SAS Library we assigned in the previous cell.

 

Hurray – we’ve done it! And want proof that the file is now in VFL? For me, the file can be found under Explorer with the following clicks:

 

LGroves_14-1678891292715.png

 

Taking it all from the top, here is the code that I used to convert a dataframe in the seaborn library in Python to a SAS dataset:

 

LGroves_15-1678891292771.png

 

Boom. Mic drop.

 

Finally - to round things out, here are some helpful links to my earlier posts, in case some of the discussion above was unclear:

 

Helpful Links:

 

 

Good luck – and happy hacking!

Version history
Last update:
‎03-23-2023 09:25 AM
Updated by:

sas-innovate-2024.png

Available on demand!

Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.

 

Register now!

Free course: Data Literacy Essentials

Data Literacy is for all, even absolute beginners. Jump on board with this free e-learning  and boost your career prospects.

Get Started

Article Tags