BookmarkSubscribeRSS Feed

Running Data Step from Python

Started ‎04-14-2016 by
Modified ‎04-16-2019 by
Views 1,458

Running Data Step from Python

 

The datastep action set in CAS allows you to run data step code with the datastep.runcode action. There are a few ways to execute data step code from the Python client. We'll cover each of them here.

 

Let's get a CAS connection to work with first.

 

In[1]: import swat

In[2]: conn = swat.CAS(host, port, username, password)

Now we need to get some data into our session.

 

In[3]: cls = conn.read_csv('https://raw.githubusercontent.com/sassoftware/sas-viya-programming/master/data/class.csv', 
                           casout=dict(name='class', caslib='casuser'))

In[4]: cls
Out[4]: CASTable('class', caslib='CASUSER(kesmit)')

 

The datastep.runcode Action

 

The most basic was to run data step code is using the datastep.runcode action directly. This action runs very much like running data step in SAS. You simply specify CAS tables rather than SAS data sets as your input and output data. In this example, we will compute the body mass index (BMI) of the students in the class data set. The output of the datastep.runcode action will contain two keys: inputTables and outputTables. Each of those keys points to a DataFrame of the information about the input and output tables including a CASTable object located in the last column.

 

In[5]: out = conn.datastep.runcode('''
   data bmi(caslib='casuser');
      set class(caslib='casuser');
      BMI = weight / (height**2) * 703;
   run;
''') In[6]: out

Screen Shot 2016-08-12 at 1.18.54 PM.png

 

We can pull the output table DataFrame out using the following line of code. The ix property is a DataFrame property that allows you to extract elements from a DataFrame at indexes or labels. In this case, we want the element in row zero, column name casTable.

 

In[7]: bmi = out.OutputCasTables.ix[0, 'casTable']
In[8]: bmi.head()

Screen Shot 2016-08-12 at 1.20.05 PM.png

 

As you can see, we have a new CAS table that now includes the BMI column.

 

The CASTable datastep Method

 

CASTable objects have a datastep method that does some of the work of wrapping your data step code with the appropriate input and output data sets. When using this method, you just give the body of the data step code. The output table name will be automatically generated. The output of the datastep method is a CASTable object that references the newly generated table, so you don't have to extract the CASTable from the underlying action results.

 

In[9]: bmi2 = cls.datastep('''BMI = weight / (height**2) * 703''')
In[10]: bmi2.head()

Screen Shot 2016-08-12 at 1.20.53 PM.png

 

The casds IPython Magic Command

 

The third way of running data step from Python is reserved for IPython users. IPython has commands that are called "magics". These commands start with (for one line commands) or %% (for cell commands) and allow extension developers to add functionality that isn't necessarily Python-based to your environment. Included in SWAT is a packgae called swat.cas.magics that can be loaded to surface the %%casds magic command. The %%casds magic gives you the ability to enter an entire IPython cell of data step code rather than Python code. This is especially useful in the IPython notebook interface.

 

Let's give the %%casds magic a try. First we have to load the swat.cas.magics extension.

 

In[11]: %load_ext swat.cas.magics

Now we can use the %%casds magic to enter an entire cell of data step code. The %casds magic requires at least one argument which contains the CAS connection object where the action should run. In most cases, you'll want to add the --output option as well which specifies the name of an output variable that will be surfaced to the Python environment which contains the output of the datastep.runcode action.

 

 

In[12]: %%casds --output out2 conn

data bmi3(caslib='casuser');
   set class(caslib='casuser');
   BMI = weight / (height**2) * 703;
run;

Screen Shot 2016-08-12 at 1.21.30 PM.png

 

Just as before, we can extract the output CASTable object from the returned DataFrames.

 

In[13]: bmi3 = out2.OutputCasTables.ix[0, 'casTable']
In[14]: bmi3.head()

Screen Shot 2016-08-12 at 1.22.07 PM.png

 

In[15]: conn.close()

 

Conclusion

 

If you are an existing SAS user, you may be relieved to find that you can still use data step in the CAS environment. Even better, you can run it from Python. This blend of languages and environments gives you an enormous number of possibilities for data analysis, and should make SAS programmers feel right at home in Python.

 

Resources

 

You can download the Jupyter notebook version of this article at https://github.com/sassoftware/sas-viya-programming/tree/master/communities.

Version history
Last update:
‎04-16-2019 08:11 AM
Updated by:

sas-innovate-2024.png

Available on demand!

Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.

 

Register now!

Free course: Data Literacy Essentials

Data Literacy is for all, even absolute beginners. Jump on board with this free e-learning  and boost your career prospects.

Get Started

Article Tags