We’re smarter together. Learn from this collection of community knowledge and add your expertise.

Tip: How to execute a Python script in SAS® Enterprise Miner™

by SAS Employee RadhikhaMyneni on ‎04-09-2015 01:34 PM - edited on ‎04-07-2016 11:59 AM by SAS Employee PatrickHall (13,569 Views)

Running Python scripts within SAS Enterprise Miner enables you to use open source packages alongside the statistical, data mining and machine learning methodologies available in SAS. The underlying technique that is used to implement this tip is explained in detail in the "Open Source Integration Using the Base SAS Java Object" paper. This tip takes the idea in the paper a step further by incorporating it into the SAS Enterprise Miner flow where models from Python package can easily be assessed and compared with those from SAS Enterprise Miner or other data mining packages.

 

To begin, download the necessary files from https://github.com/sassoftware/enlighten-integration/tree/master/SAS_Base_OpenSrcIntegration and follow the steps outlined in “Compiling the Provided Java Classes” and “Setting the Java Classpath” sections provided with the paper to compile and setup the Java classpath in the SAS environment. The working directory where the files are downloaded is referred to as WORK_DIR.

 

The training data has 785 columns - the first column is the label (dependent variable) with digits 1 or 7 and the following 784 columns are predictors. This tip uses the Java class SASJavaExec.java and digitsdata_17_train.csv files from the ZIP file and em_digitsdata_forest.py Python script that is attached to this post. The code in the Python script uses Random Forest ensemble from scikit-learn package to model binary target in the training data. Make sure to copy the Python script em_digitsdata_forest.py to the WORK_DIR.

 

Follow these 5 steps to execute the Python model and display its fit statistics in SAS Enterprise Miner:

 

STEP 1: SETUP

Create a new project in SAS Enterprise Miner and copy below start-up code into Project Start Code window. Update WORK_DIR (working directory where the downloaded files are located) and PYTHON_EXEC_COMMAND (location of Python executable) appropriately for your system and click the RunNow button.

   *** WORKING DIRECTORY      (----- USER UPDATE NEEDED -----);

   %let WORK_DIR = C:\SGF2015\OpenSrcIntegration;

   *** SYSTEM PYTHON LOCATION (----- USER UPDATE NEEDED -----);

   %let PYTHON_EXEC_COMMAND = C:\Anaconda\python.exe;              

   *** JAVA LIBRARIES/CLASS FILES LOCATION;

   %let JAVA_BIN_DIR = &WORK_DIR.\bin;

   options linesize = MAX;

 

STEP 2: A SIMPLE DIAGRAM

Create a new diagram with the SAS Code node followed by the Metadata node and the Model Import node.

1_diagram.PNG                  

STEP 3: SAS CODE

Copy the following SAS code example into the SAS Code node and Run it. If the Java classpath is not specified correctly, an error is returned. Make sure to correct the problem before proceeding and refer to the paper for details on setting the Java classpath. The following SAS code:

  • Validates the Java classpath
  • Executes Python script using Java Object in a DATA step
  • Imports CSV files into SAS data sets and
  • Merges train data with predicted probabilities

 

   *** VALIDATE JAVA CLASSPATH;

   data _null_;

     length _x1 $32767;

     _x1 = sysget('CLASSPATH');

     _x2 = index(upcase(trim(_x1)), %upcase("&JAVA_BIN_DIR"));

     if _x2 = 0 then put "ERROR: Invalid Java Classpath.";

   run;

 

   /*** Part I: Python ***/

   data _null_;

     length rtn_val 8;

     *** Python program takes working directory as first argument;

     python_pgm = "&WORK_DIR.\em_digitsdata_forest.py";

     python_arg1 = "&WORK_DIR";    

     python_call = cat('"', trim(python_pgm), '" "', trim(python_arg1), '"');

 

     declare javaobj j("dev.SASJavaExec", "&PYTHON_EXEC_COMMAND", python_call);

     j.callIntMethod("executeProcess", rtn_val);

   run;

 

   *** Part III: Load CSV files into SAS datasets ****************;

   proc import

        out = predict_py

        datafile = "&WORK_DIR.\predict_train_py_forest_prob.csv"

        dbms = csv

        replace;

     getnames = no;

   run;

 

   proc import

        out = digitsdata_17_train

        datafile = "&WORK_DIR.\digitsdata_17_train.csv"

        dbms = csv

        replace;

    getnames = yes;

   run;

 

   data &EM_EXPORT_TRAIN;

     set  digitsdata_17_train;

     set predict_py (rename=(var1=p_label1 var2=p_label7));

   run;

 

STEP 4: UPDATE METADATA

Select the Metadata node and click the button next to Train property under Variables tab. For the variable label, change New Role to Target and New Level to Binary as shown in figure below. The purpose of this node is to add metadata to the output data set generated by the Python script.

2_metadata_var.PNG

 

STEP 5: FIT STATISTICS

Lastly, select Model Import node and click on the button next to Mapping Editor under Predicted Variables tab and make sure it is similar to the figure below. Run all nodes and view the fit statistics in the Results window of the Model Import node.

3_modelimport_me.PNG

The Model Import node can further be connected to Model Comparison node to compare the model in the Python script with other existing models built in the SAS Enterprise Miner.

 

The input and output files exchanged between the Python script and SAS Enterprise Miner are in standard CSV format to enable flexibility and ease of use of this solution. Also, this methodology is not limited to a Python script but is extendable to any valid executable command and their necessary command-line arguments.

 

Attachment
Comments
by Occasional Contributor bwasicak
on ‎05-05-2015 11:22 AM

I don't see the csv file in the attachment, can that be added?

by SAS Employee RadhikhaMyneni
on ‎05-05-2015 11:39 AM

Hi Robert,

The necessary CSV files for this project are available in SAS_Base_OpenSrcIntegration.zip attachment under Tip: Open Source Integration Using the Base SAS Java Object.

Radhikha

by Scotts
on ‎05-09-2015 02:36 AM

Hi Radhikha,

I would like to know how to pass macro variable and parameter from SAS to Python? Could you help on this? Thanks.

by SAS Employee RadhikhaMyneni
on ‎05-09-2015 12:26 PM

Hi Scott,

The way to do it is through command line arguments to the Python script. In the example code above, &WORK_DIR is one such case - it is a macro variable and is passed to the Python script as a first argument. If you want to pass in additional arguments, feel free to concatenate them to the "python_call" variable:

python_call = cat('"', trim(python_pgm), '" "', trim(python_arg1), '" "', trim(python_arg2), '"');

Within the Python script, you can access the content of python_arg2 using sys.argv[2]. You can pass in as many arguments as you need using this mechanism assuming you don't hit the character limit on the command line.

by Scotts
on ‎05-10-2015 08:19 AM

Hi Radhikha,

It works well, many thanks for your guidance!

Your turn
Sign In!

Want to write an article? Sign in with your profile.