A common question that comes into SAS Technical Support is how to take the output from a text mining model and apply that to a new set of data. There are multiple ways to do this.
When your score data set contains a manageable amount of data, the easiest way to score a new data set is to add the data set to the SAS Enterprise Miner project. The data set has to have a Role of Score (set in the Input Data Source property panel):
Connecting it to the Score node will automatically score the new observations when the Score node is run successfully.
Similarly, the HP Text Miner node will automatically score the new observations when the Score node is run successfully.
The more common approach to scoring new observations is to score outside of the SAS Enterprise Miner application, especially when the score data set is large. Commonly used applications to score new text observations are:
Unlike standard SAS Enterprise Miner scoring files, applying text mining scoring code requires two separate files that are output from the Score node. The two programs needed are:
The SAS program that contains the SAS Text Miner pre-score code is called prescore.sas. The SAS program that contains the SAS Text Miner scoring code is called pathpublishscorecode.sas. These files are found in the following location:
<project path>\<project name>\Workspaces\<Diagram ID>\<Score node ID>
Using these two files, scoring a new data set can be accomplished by running this code using any of the above applications:
%include "<file path to prescore.sas>\prescore.sas";
***This is creating a copy of the data set that I want to score. If you want to just add new
variables to your score data set, you can skip this data statement and just type in the
score data set in the %let statement below;
data a;
set mylib.text_to_be_scored;
run;
%let em_score_output=a;
data &em_score_output;
set &em_score_output;
%include "<file path to pathpublishscorecode.sas>\pathpublishscorecode.sas";
run;
For scoring data using output from the HP Text Miner node, you can use basically the same code as above. There are two differences.
data a;
set mylib.text_to_be_scored;
run;
%let em_score_output=a;
%include "<file path to pathpublishscorecode.sas>\pathpublishscorecode.sas";
run;
Scoring a new set of data within SAS Contextual Analysis can be accomplished by going to File->Score External Data Set.
Each of the four fields need to have a valid value. Here is some information regarding these values:
The more common approach to scoring new observations is to score outside of the SAS Contextual Analysis web application. Commonly used applications to score new text observations are:
To find the score code, in the menu of SAS Contextual Analysis, click on the down arrow next to View and select either the Concept Code, Sentiment Code, or Category Code.
After copying the appropriate score code, there are a couple of lines in the header of the score code that need to have some values specified. For example, in the Categories Score code, the top section is below and the italicized values need to be modified:
%sysfunc(ifc(%symexist(tm_defined_vars),, %nrstr(
/* the path to the directory containing the data set you would like to score */
%let lib_path={put_your_directory_path_here};
/* the data set you would like to score */
%let input_ds = _my_lib.{put_your_data_set_name_here};
/* the column in the data set that contains the text data to score */
%let document_column = {put_your_document_column_name_here};
)));
There are several ways to score new data sets using text models. Each user needs to find the best way to apply score code for their business.
Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!
Data Literacy is for all, even absolute beginners. Jump on board with this free e-learning and boost your career prospects.