We’re smarter together. Learn from this collection of community knowledge and add your expertise.

Using Text Editors and Notebooks with SAS Visual Data Mining and Machine Learning

by SAS Employee BethEbersole on ‎06-28-2017 04:13 PM (909 Views)

SAS Visual Data Mining and Machine Learning now lets you call CAS actions from Python, Java, or Lua code! For some of us, this opens up a whole new world. This blog provides a general introduction to text editors and Jupyter notebooks, for those who may be new to using these tools to organize and run programming code, such as Python code.

Text editors

Programming code is often developed, stored, shared with others, and even executed via a text editor. There are many choices for text editors, such as Notepad++, Sublime 3, Eclipse, Ultraedit, Atom, Kate, etc. It appears that everybody has their personal favorite. Perhaps someday there will be an American Idol show for text editors.

Today’s text editors have features like syntax highlighting, code collapsing, floating tabs, autocompletion, minimap/document map, color coding, multicursors, default and customized hot keys, tab triggers, convenient document navigation, and the ability to execute code directly from the editor. Text editors are very useful for editing and storing large swaths of code.

Below are examples of Notepad++ and Sublime 3 with Python code to run a neural network in CAS. Notepad++ is free and is a Windows text editor. Sublime 3 is available for Windows, OS X, and Linux, and includes a free trial, but it is not free permanently. The color scheme of text editors can be easily changed, and helps to identify the role of the text string in the program. For example, see screen shots below of Notepad++ and Sublime 3 each set to its respective Monokai color scheme:

  • In Notepad ++: Settings/Style Configurator/Select Theme/Monokai
  • In Sublime: Preferences/Color Scheme/Monokai

 

 Notepad++.jpg                                                                                                                                                                

                                        


 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

Notepad++   

 

Sublime3.jpg

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

Sublime 3

 

To find the programming languages supported in Notepad++, go to Language and click on the first letter of the language you are interested in, such as P for Python, as shown below.

1.jpg


To find the languages supported in Sublime 3, go to View/Syntax.

2.jpg
 
You can run pure Python code directly from these text editors.  However, we are not interested in running pure Python code on its own here. Remember, we want the Python code to call CAS actions, so that we can take advantage of the speed and parallelization that the CAS engine provides.

So why use a notebook?

Text editors let you see, well, text. Notebooks have live code, explanatory text, equations, url links, output, tables, images, graphs, and other rich media, all conveniently displayed in one notebook. Think of an old-fashioned notebook, such as Galileo’s below, where you would commonly find images and equations interspersed with text. FYI, Galileo discovered the four largest moons of Jupiter.

3.jpg
 

 

Using a Jupyter notebook, you can clearly annotate your code in easily readable markdown language. This helps to make your work understandable and usable not only by others, but by yourself when you come back to look at it in three months. See my Jupyter notebook example below.

4.jpg
 


Jupyter notebooks can be configured to let you conveniently run your code from a web browser.  Also, keep in mind that we are not just running pure Python code, but we must have the SAS Scripting Wrapper for Analytics Transfer (SWAT) imported to execute code against CAS.

More about Jupyter

Jupyter is a command shell for interactive computing in multiple programming languages, including Julia, Python, and R. It evolved from the IPython project as a set of open-source software tools for interactive and exploratory computing. IPython was created in 2001 by Dr. Fernando Pérez (University of California-Berkeley). Dr. Brian Granger of Tech-X Corporation joined the IPython project in 2004. Jupyter runs on Linux and other Unix-type operating systems, Apple OS X, and Microsoft Windows. It can be accessed on a local desktop or installed on a remote server and accessed through the internet.

5.jpg
 
Jupyter stores a session’s inputs and outputs into a pair of numbered tables called In and Out, as shown in a very simple example below.

6.jpg
 


An open Jupyter notebook has exactly one interactive session connected to an IPython kernel, which will execute code sent by the user and send results back. A notebook’s kernel is its computational engine that executes the code contained in the notebook. For example, Jupyter’s IPython kernel executes Python code, its IRkernel executes R code, and its IJulia kernel executes Julia code.

The kernel remains active even if the web browser window is closed. If you reopen the same notebook, it will reconnect the web application to the same kernel. CAUTION: Jupyter Notebook is designed for a single user. Other clients can connect to the same underlying IPython kernel. If you have multiple users and want authentication, you will want to use JupyterHub to manage multiple instances of a single-user notebook.

You can save a session’s inputs and outputs to a log file. You can create aliases for common system tasks, navigate the file system with some of the common Linux commands such as cd and ls, and prefix any command with ! for direct execution by the underlying operating system.

A Few Jupyter Tips

Jupyter also offers a set of control commands called magic commands that improve Python’s usability in an interactive context. Three examples of magic commands are:

  • %matplotlib inline  By default, plots and graphs are not displayed within the notebook. To have them display within the notebook, simply execute the magic command %matplotlib inline prior to running the plotting code.

  • %run  Although typing code interactively with Jupyter is convenient, long programs are commonly written in text editors, as I mentioned above. The %run magic command lets you run any Python file from a text editor as you had typed it into the Jupyter notebook.

  • %connect_info  Get connection information by running %connect_info magic command. Without an ID, --existing will connect to the most recently started kernel.


You can include images in your Jupyter notebook, as I did with the photo of the planet Jupiter screen-captured earlier in this blog. To do that, your image must be in the same folder as your Jupyter, and then you can simply type into your Jupyter markdown cell:

7.jpg
 
You can view keyboard shortcuts in Jupyter by going to Help/Keyboard Shortcuts, as shown below:


8.jpg
 

9.jpg

 


 
Closing


Jupyter is an intuitive, interactive and exploratory computing tool that allows you to show step-by-step what your programming code is doing. If you are just entering the realm of using Python, Java, or Lua, I recommend that you familiarize yourself with the latest features of your favorite text editor and of Jupyter. A few resources to get you started are listed below.


Then you are on your way to running CAS actions via Jupyter using Python code! As demonstrated in Ryan Gillespie’s 4 minute video, you can take advantage of highly parallelized processes in the CAS analytic engine to run advanced machine learning algorithms by running Python code from Jupyter. And you can use all of the features of Jupyter to easily annotate your code so that you can explain it to and share it with colleagues, managers, and customers. Not to mention, when you come back to look at your project in four months when you have forgotten most of what you did (in my case, in just four hours), your annotations will explain your code to yourself!

FOR MORE INFO