So you want to score textual data? Let me count the ways.

2 Likes

A common question that comes into SAS Technical Support is how to take the output from a text mining model and apply that to a new set of data. There are multiple ways to do this.

Score new data within the SAS Enterprise Miner application.

When your score data set contains a manageable amount of data, the easiest way to score a new data set is to add the data set to the SAS Enterprise Miner project. The data set has to have a Role of Score (set in the Input Data Source property panel):

Connecting it to the Score node will automatically score the new observations when the Score node is run successfully.

Similarly, the HP Text Miner node will automatically score the new observations when the Score node is run successfully.

Scoring outside of the SAS Enterprise Miner application.

The more common approach to scoring new observations is to score outside of the SAS Enterprise Miner application, especially when the score data set is large. Commonly used applications to score new text observations are:

Base SAS
SAS Enterprise Guide
SAS Data Integration Studio
Batch code

Unlike standard SAS Enterprise Miner scoring files, applying text mining scoring code requires two separate files that are output from the Score node. The two programs needed are:

Prescore.sas
Pathpublishscorecode.sas

The SAS program that contains the SAS Text Miner pre-score code is called prescore.sas. The SAS program that contains the SAS Text Miner scoring code is called pathpublishscorecode.sas. These files are found in the following location:

<project path>\<project name>\Workspaces\<Diagram ID>\<Score node ID>

Using these two files, scoring a new data set can be accomplished by running this code using any of the above applications:

%include "<file path to prescore.sas>\prescore.sas";

***This is creating a copy of the data set that I want to score. If you want to just add new

variables to your score data set, you can skip this data statement and just type in the

score data set in the %let statement below;

data a;

set mylib.text_to_be_scored;

run;

%let em_score_output=a;

data &em_score_output;

set &em_score_output;

%include "<file path to pathpublishscorecode.sas>\pathpublishscorecode.sas";

run;

For scoring data using output from the HP Text Miner node, you can use basically the same code as above. There are two differences.

The HP Text Miner node does not generate the prescore.sas file, as that is part of the pathpublishscorecode.sas file.
The pathpublishscorecode.sas file contains data step code and procedure calls, so the code used to score new observations looks like this:

data a;

set mylib.text_to_be_scored;

run;

%let em_score_output=a;

%include "<file path to pathpublishscorecode.sas>\pathpublishscorecode.sas";

run;

Scoring data with output from SAS Contextual Analysis within the web application

Scoring a new set of data within SAS Contextual Analysis can be accomplished by going to File->Score External Data Set.

Each of the four fields need to have a valid value. Here is some information regarding these values:

Scored data set (output) – This is the name of the to-be-generated scored data set.
SAS folder location – Location for saving the scored data within SAS metadata.
Project Model – This is the name of the project that contains the model and score code.
Analysis data set (input) – This is the data set to be scored.

Scoring data with output from SAS Contextual Analysis outside the web application

The more common approach to scoring new observations is to score outside of the SAS Contextual Analysis web application. Commonly used applications to score new text observations are:

Base SAS
SAS Enterprise Guide
SAS Data Integration Studio
Batch code

To find the score code, in the menu of SAS Contextual Analysis, click on the down arrow next to View and select either the Concept Code, Sentiment Code, or Category Code.

After copying the appropriate score code, there are a couple of lines in the header of the score code that need to have some values specified. For example, in the Categories Score code, the top section is below and the italicized values need to be modified:

%sysfunc(ifc(%symexist(tm_defined_vars),, %nrstr(

/* the path to the directory containing the data set you would like to score */

%let lib_path={put_your_directory_path_here};

/* the data set you would like to score */

%let input_ds = _my_lib.{put_your_data_set_name_here};

/* the column in the data set that contains the text data to score */

%let document_column = {put_your_document_column_name_here};

)));

There are several ways to score new data sets using text models. Each user needs to find the best way to apply score code for their business.

SAS Communities Library