BookmarkSubscribeRSS Feed

Tip: How to execute a Python script in SAS® Enterprise Miner™

Started ‎04-09-2015 by
Modified ‎04-07-2016 by
Views 29,110

Running Python scripts within SAS Enterprise Miner enables you to use open source packages alongside the statistical, data mining and machine learning methodologies available in SAS. The underlying technique that is used to implement this tip is explained in detail in the "Open Source Integration Using the Base SAS Java Object" paper. This tip takes the idea in the paper a step further by incorporating it into the SAS Enterprise Miner flow where models from Python package can easily be assessed and compared with those from SAS Enterprise Miner or other data mining packages.

 

To begin, download the necessary files from https://github.com/sassoftware/enlighten-integration/tree/master/SAS_Base_OpenSrcIntegration and follow the steps outlined in “Compiling the Provided Java Classes” and “Setting the Java Classpath” sections provided with the paper to compile and setup the Java classpath in the SAS environment. The working directory where the files are downloaded is referred to as WORK_DIR.

 

The training data has 785 columns - the first column is the label (dependent variable) with digits 1 or 7 and the following 784 columns are predictors. This tip uses the Java class SASJavaExec.java and digitsdata_17_train.csv files from the ZIP file and em_digitsdata_forest.py Python script that is attached to this post. The code in the Python script uses Random Forest ensemble from scikit-learn package to model binary target in the training data. Make sure to copy the Python script em_digitsdata_forest.py to the WORK_DIR.

 

Follow these 5 steps to execute the Python model and display its fit statistics in SAS Enterprise Miner:

 

STEP 1: SETUP

Create a new project in SAS Enterprise Miner and copy below start-up code into Project Start Code window. Update WORK_DIR (working directory where the downloaded files are located) and PYTHON_EXEC_COMMAND (location of Python executable) appropriately for your system and click the RunNow button.

   *** WORKING DIRECTORY      (----- USER UPDATE NEEDED -----);

   %let WORK_DIR = C:\SGF2015\OpenSrcIntegration;

   *** SYSTEM PYTHON LOCATION (----- USER UPDATE NEEDED -----);

   %let PYTHON_EXEC_COMMAND = C:\Anaconda\python.exe;              

   *** JAVA LIBRARIES/CLASS FILES LOCATION;

   %let JAVA_BIN_DIR = &WORK_DIR.\bin;

   options linesize = MAX;

 

STEP 2: A SIMPLE DIAGRAM

Create a new diagram with the SAS Code node followed by the Metadata node and the Model Import node.

1_diagram.PNG                  

STEP 3: SAS CODE

Copy the following SAS code example into the SAS Code node and Run it. If the Java classpath is not specified correctly, an error is returned. Make sure to correct the problem before proceeding and refer to the paper for details on setting the Java classpath. The following SAS code:

  • Validates the Java classpath
  • Executes Python script using Java Object in a DATA step
  • Imports CSV files into SAS data sets and
  • Merges train data with predicted probabilities

 

   *** VALIDATE JAVA CLASSPATH;

   data _null_;

     length _x1 $32767;

     _x1 = sysget('CLASSPATH');

     _x2 = index(upcase(trim(_x1)), %upcase("&JAVA_BIN_DIR"));

     if _x2 = 0 then put "ERROR: Invalid Java Classpath.";

   run;

 

   /*** Part I: Python ***/

   data _null_;

     length rtn_val 8;

     *** Python program takes working directory as first argument;

     python_pgm = "&WORK_DIR.\em_digitsdata_forest.py";

     python_arg1 = "&WORK_DIR";    

     python_call = cat('"', trim(python_pgm), '" "', trim(python_arg1), '"');

 

     declare javaobj j("dev.SASJavaExec", "&PYTHON_EXEC_COMMAND", python_call);

     j.callIntMethod("executeProcess", rtn_val);

   run;

 

   *** Part III: Load CSV files into SAS datasets ****************;

   proc import

        out = predict_py

        datafile = "&WORK_DIR.\predict_train_py_forest_prob.csv"

        dbms = csv

        replace;

     getnames = no;

   run;

 

   proc import

        out = digitsdata_17_train

        datafile = "&WORK_DIR.\digitsdata_17_train.csv"

        dbms = csv

        replace;

    getnames = yes;

   run;

 

   data &EM_EXPORT_TRAIN;

     set  digitsdata_17_train;

     set predict_py (rename=(var1=p_label1 var2=p_label7));

   run;

 

STEP 4: UPDATE METADATA

Select the Metadata node and click the button next to Train property under Variables tab. For the variable label, change New Role to Target and New Level to Binary as shown in figure below. The purpose of this node is to add metadata to the output data set generated by the Python script.

2_metadata_var.PNG

 

STEP 5: FIT STATISTICS

Lastly, select Model Import node and click on the button next to Mapping Editor under Predicted Variables tab and make sure it is similar to the figure below. Run all nodes and view the fit statistics in the Results window of the Model Import node.

3_modelimport_me.PNG

The Model Import node can further be connected to Model Comparison node to compare the model in the Python script with other existing models built in the SAS Enterprise Miner.

 

The input and output files exchanged between the Python script and SAS Enterprise Miner are in standard CSV format to enable flexibility and ease of use of this solution. Also, this methodology is not limited to a Python script but is extendable to any valid executable command and their necessary command-line arguments.

 

Comments

I don't see the csv file in the attachment, can that be added?

Hi Robert,

The necessary CSV files for this project are available in SAS_Base_OpenSrcIntegration.zip attachment under Tip: Open Source Integration Using the Base SAS Java Object.

Radhikha

Hi Radhikha,

I would like to know how to pass macro variable and parameter from SAS to Python? Could you help on this? Thanks.

Hi Scott,

The way to do it is through command line arguments to the Python script. In the example code above, &WORK_DIR is one such case - it is a macro variable and is passed to the Python script as a first argument. If you want to pass in additional arguments, feel free to concatenate them to the "python_call" variable:

python_call = cat('"', trim(python_pgm), '" "', trim(python_arg1), '" "', trim(python_arg2), '"');

Within the Python script, you can access the content of python_arg2 using sys.argv[2]. You can pass in as many arguments as you need using this mechanism assuming you don't hit the character limit on the command line.

Hi Radhikha,

It works well, many thanks for your guidance!

Hi All, 

 

I am following the same steps as mentioned above to execute the Python scripts in SAS enterprise miner. But I could not be able to locate the JAVA Class path. As I am using linux server I could not be able to the location where the Java class/libraries are installed. 

Any help on this is greatly appreciated. 

 

Thanks

Rama Kishore 

Hi @ramakishoredarl, sorry you're having trouble! You'll get a faster answer by posting a question in the SAS Data Mining and Machine Learning Community than by commenting on this article, which few members will see. Thank you for using our communities.

Version history
Last update:
‎04-07-2016 11:59 AM
Updated by:

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

Free course: Data Literacy Essentials

Data Literacy is for all, even absolute beginners. Jump on board with this free e-learning  and boost your career prospects.

Get Started

Article Tags