The Models Text Door: Monitoring NLP models within SAS Model Manager

2 Likes

Natural Language Processing (NLP) models, including those trained in SAS Visual Text Analytics, adhere to the same Analytics Life Cycle as other predictive & classification machine learning models. Since April of 2021, text concept, sentiment, topic, and category models could be registered with a few button clicks from SAS Visual Text Analytics into SAS Model Manager and deployed to CAS and MAS destinations. This means that SAS text models can be managed and deployed with ease. But, what about performance monitoring?

Why monitor model performance?

All models decay. Models become less relevant and useful over time. Text models are no exception. Model decay can be caused by gradual changes of the underlying data or sudden shocks and changes. Therefore, it is critical to continuously monitor the performance of models over time so that these signals can be caught early and mitigated.

As of SAS Viya 2022.09 (September 2022), you can monitor category models. Category models leverage rules to classify documents into groups. For example, when dealing with product complaints, a category model can try to classify which product the complaint is lodged against.

What metrics do we monitor?

Category models are a good candidate for performance monitoring since they are trained to predict a category classification. Users can either build their own category rules or use SAS Visual Text Analytics to automatically generate rules against an existing target column. Having a target column enables performance monitoring since we can determine how well our text models predicts the category.

Text category models have several similarities with predictive & classification models. But there are fundamental differences that affect which metrics are used for monitoring. For example, text categorization can allow for assignment of the same observation against more than one category, whereas machine learning models based on categorical targets usually assign one or the other label.

Keeping this in mind, text category models focus upon:

Coverage, which is the percent of documents for which a model was able to identify a category. Ideally, every document should have a match, so a high coverage percentage is a sign of a healthy model.
Precision, which is the ratio of true positives to the sum of true positives and false positives. This is also represented as TP / (TP + FP). Precision tries to address the question of “How correct is my model?”. A precision of 100% means that there are zero false matches for the model. A low precision means that the model rules may be weak or too broad.
Recall, which is the ratio of true positives to the sum of true positives and false negatives. This is also represented as TP / (TP + FP). Recall tries to address the question of “How many correct classifications has my model been able to capture?”. A recall of 100% means that there are no false negatives.
Overall False Positive Rate, which is the percent of times the model made a false match. This is also represented as FP / (FP + TN).
F1 Score, which is an assessment of model quality using both Recall and Precision. The calculation for the F1 score is F1 = 2 * (Precision * Recall) / (Precision + Recall). Ideally, F1 should be near 100%
Confusion Matrix, which shows the number of True Positives, True Negatives, False Positives, and False Negatives.

How do I create a performance monitoring report?

We can monitor the performance of category models in just a few steps:

From SAS Visual Text Analytics, assign the Category role to one column. Performance monitoring is not supported for category models trained to predict multiple category classifications.
Next, run the pipeline and register the category model.
From SAS Model Manager, set the category model as champion. When prompted, add the model output variables to the project output variables.
Next, from the project Properties tab, change the Model function from Text analytics to Text categories. Save the project.
Open the Model Evaluation pane and fill out the Default training table, Actual category variable, Document Id variable, Text variable, and Predicted category variable properties. Save the project.
From the Performance tab, create a new performance monitoring definition and hit Run.

To see these steps in action, check out the following demo:

(view in My Videos)

What would you like to see next for SAS Model Manager? What feedback do you have our text performance monitoring? Let me know in the comments or submit a request to SASWare Ballots with the Model Manager label 🗳️