I trained a new neural network on a dataset using the following code:
When looking at the scored dataset on the test dataset, it looks fine with no missing probability values as can be seen below:
However, I then decided in a separate sas file to score a new data using the scoring code that was saved when running the model (see the code option near the bottom of the first screen shot)
This code (which scores the new dataset is below):
However, in the outputted data, all of the different probabilities (of the predicted categories of the target variable) for every observation is missing (see the data below):
It is like that for every variable.
This is despite the fact that this new data that was scored actually comes from the original dataset used to train and test the model. I have checked whether it is due to missing values in the columns; but for this to be an issue like it is, there would have to be missing values in every row of the predictor variables, yet nearly every row does not have missing values.
I have attached the log for the code that includes the part which does the scoring of the data.
I am wondering what the cause of these missing values is, essentially and how to fix it?
I also note that I trained a gradient boosting model on the same data using the code below:
As well as this to save the model:
I then used Astore to score new data according to that gradient boosting model. This new data was the same as that whose scoring by the neural network model led to the missing value problem. I scored the data with the gradient boosting model in the same file as that used to score the neural network model and the code used was below:
There were no missing values in the probabilities in the scored data in this case. An example of this scored data with no missing values is below:
So, I am wondering what could be causing (and how can I fix) this issue with missing probabilities with the neural network model given that the dataset that is fed into the code to be scored is exactly the same as the data that is fed into the astore procedure for the gradient boosting model
Don't you have a variable named _WARN_ in the scored data set that indicates why the model could not be applied?
If I have time later today, I will analyze the log-file.
Moving your question to "SAS Data Science" - board.
Koen
I have looked and I definitely do not have a variable called _WARN_ in the scored dataset. Should this variable be present?
@William29 wrote:
I have looked and I definitely do not have a variable called _WARN_ in the scored dataset. Should this variable be present?
Not necessarily.
I know that Enterprise Miner (that's a SAS 9.x tool) was including this variable when scoring (new) data with a deployed model.
Koen
I did notice that my gradient boosting model which uses astore to make predictions does have the _warn_ variable. However, for the scoring data produced from this model, only a few select rows have an entry (an M) in the _WARN_ variable, and they still have predicted probabilities (rather than missing values). This is unlike the scored data from the Neural Network that makes use of the CODE function; they neither have the _WARN_ variable and they are all missing.
Thanks for feedback.
You can label your own answer above as the solution.
Koen
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Use this tutorial as a handy guide to weigh the pros and cons of these commonly used machine learning algorithms.
Find more tutorials on the SAS Users YouTube channel.