BookmarkSubscribeRSS Feed
William29
Obsidian | Level 7

I trained a new neural network on a dataset using the following code:

 

William29_0-1725256823412.png

When looking at the scored dataset on the test dataset, it looks fine with no missing probability values as can be seen below:

 

William29_1-1725257024779.png

 

However, I then decided in a separate sas file to score a new data using the scoring code that was saved when running the model (see the code option near the bottom of the first screen shot)

This code (which scores the new dataset is below):

 

William29_2-1725257366731.png

However, in the outputted data, all of the different probabilities (of the predicted categories of the target variable) for every observation is missing (see the data below):

 

William29_3-1725257864125.png

It is like that for every variable.

 

This is despite the fact that this new data that was scored actually comes from the original dataset used to train and test the model. I have checked whether it is due to missing values in the columns; but for this to be an issue like it is, there would have to be missing values in every row of the predictor variables, yet nearly every row does not have missing values.

 

I have attached the log for the code that includes the part which does the scoring of the data.

 

I am wondering what the cause of these missing values is, essentially and how to fix it?

 

I also note that I trained a gradient boosting model on the same data using the code below:

 

William29_4-1725258471725.png

As well as this to save the model:

William29_6-1725258754949.png

 

 

 

I then used Astore to score new data according to that gradient boosting model. This new data was the same as that whose scoring by the neural network model led to the missing value problem. I scored the data with the gradient boosting model in the same file as that used to score the neural network model and the code used was below:

 

William29_5-1725258688644.png

 

There were no missing values in the probabilities in the scored data in this case. An example of this scored data with no missing values is below:

 

William29_7-1725258989195.png

 

 

So, I am wondering what could be causing (and how can I fix) this issue with missing probabilities with the neural network model given that the dataset that is fed into the code to be scored is exactly the same as the data that is fed into the astore procedure for the gradient boosting model

 

6 REPLIES 6
sbxkoenk
SAS Super FREQ

Don't you have a variable named _WARN_ in the scored data set that indicates why the model could not be applied?

If I have time later today, I will analyze the log-file.

 

Moving your question to "SAS Data Science" - board.

 

Koen

William29
Obsidian | Level 7

I have looked and I definitely do not have a variable called _WARN_ in the scored dataset. Should this variable be present?

sbxkoenk
SAS Super FREQ

@William29 wrote:

I have looked and I definitely do not have a variable called _WARN_ in the scored dataset. Should this variable be present?


Not necessarily.
I know that Enterprise Miner (that's a SAS 9.x tool) was including this variable when scoring (new) data with a deployed model.

 

Koen

William29
Obsidian | Level 7

I did notice that my gradient boosting model which uses astore to make predictions does have the _warn_ variable. However, for the scoring data produced from this model, only a few select rows have an entry (an M) in the _WARN_ variable, and they still have predicted probabilities (rather than missing values). This is unlike the scored data from the Neural Network that makes use of the CODE function; they neither have the _WARN_ variable and they are all missing.

William29
Obsidian | Level 7
I managed to solve this issue by looking at the score code file itself and finding the conditions that lead to what it calls a "bad val" where-apon it sets everything to empty. It had to do with categories of the categorical predictors in the new data not in the training data
sbxkoenk
SAS Super FREQ

Thanks for feedback.
You can label your own answer above as the solution.

 

Koen

SAS Innovate 2025: Save the Date

 SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!

Save the date!

How to choose a machine learning algorithm

Use this tutorial as a handy guide to weigh the pros and cons of these commonly used machine learning algorithms.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 6 replies
  • 1504 views
  • 2 likes
  • 2 in conversation