Re: What is causing a scored dataset from a neural network to have mis...

William29 · Posted 09-02-2024 02:37 AM

I trained a new neural network on a dataset using the following code:

When looking at the scored dataset on the test dataset, it looks fine with no missing probability values as can be seen below:

However, I then decided in a separate sas file to score a new data using the scoring code that was saved when running the model (see the code option near the bottom of the first screen shot)

This code (which scores the new dataset is below):

However, in the outputted data, all of the different probabilities (of the predicted categories of the target variable) for every observation is missing (see the data below):

It is like that for every variable.

This is despite the fact that this new data that was scored actually comes from the original dataset used to train and test the model. I have checked whether it is due to missing values in the columns; but for this to be an issue like it is, there would have to be missing values in every row of the predictor variables, yet nearly every row does not have missing values.

I have attached the log for the code that includes the part which does the scoring of the data.

I am wondering what the cause of these missing values is, essentially and how to fix it?

I also note that I trained a gradient boosting model on the same data using the code below:

As well as this to save the model:

I then used Astore to score new data according to that gradient boosting model. This new data was the same as that whose scoring by the neural network model led to the missing value problem. I scored the data with the gradient boosting model in the same file as that used to score the neural network model and the code used was below:

There were no missing values in the probabilities in the scored data in this case. An example of this scored data with no missing values is below:

So, I am wondering what could be causing (and how can I fix) this issue with missing probabilities with the neural network model given that the dataset that is fed into the code to be scored is exactly the same as the data that is fed into the astore procedure for the gradient boosting model

sbxkoenk · Posted 09-02-2024 05:14 AM

Don't you have a variable named _WARN_ in the scored data set that indicates why the model could not be applied?

If I have time later today, I will analyze the log-file.

Moving your question to "SAS Data Science" - board.

Koen

William29 · Posted 09-03-2024 12:59 AM

I have looked and I definitely do not have a variable called _WARN_ in the scored dataset. Should this variable be present?

sbxkoenk · Posted 09-03-2024 06:15 AM

@William29 wrote:

I have looked and I definitely do not have a variable called _WARN_ in the scored dataset. Should this variable be present?

Not necessarily.
I know that Enterprise Miner (that's a SAS 9.x tool) was including this variable when scoring (new) data with a deployed model.

Koen

William29 · Posted 09-03-2024 07:03 PM

I did notice that my gradient boosting model which uses astore to make predictions does have the _warn_ variable. However, for the scoring data produced from this model, only a few select rows have an entry (an M) in the _WARN_ variable, and they still have predicted probabilities (rather than missing values). This is unlike the scored data from the Neural Network that makes use of the CODE function; they neither have the _WARN_ variable and they are all missing.

William29 · Posted 09-06-2024 12:17 AM

I managed to solve this issue by looking at the score code file itself and finding the conditions that lead to what it calls a "bad val" where-apon it sets everything to empty. It had to do with categories of the categorical predictors in the new data not in the training data

sbxkoenk

Thanks for feedback.
You can label your own answer above as the solution.

Koen

What is causing a scored dataset from a neural network to have missing values for every probability?

Re: What is causing a scored dataset from a neural network to have missing values for every probabil

Re: What is causing a scored dataset from a neural network to have missing values for every probabil

Re: What is causing a scored dataset from a neural network to have missing values for every probabil

Re: What is causing a scored dataset from a neural network to have missing values for every probabil

Re: What is causing a scored dataset from a neural network to have missing values for every probabil

Re: What is causing a scored dataset from a neural network to have missing values for every probabil

What is causing a scored dataset from a neural network to have missing values for every probability?

Re: What is causing a scored dataset from a neural network to have missing values for every probabil

Re: What is causing a scored dataset from a neural network to have missing values for every probabil

Re: What is causing a scored dataset from a neural network to have missing values for every probabil

Re: What is causing a scored dataset from a neural network to have missing values for every probabil

Re: What is causing a scored dataset from a neural network to have missing values for every probabil

Re: What is causing a scored dataset from a neural network to have missing values for every probabil

SAS Innovate 2025: Call for Content