BookmarkSubscribeRSS Feed
giant_wolf00
Fluorite | Level 6

Hi,

 

I'm relatively new to SAS Miner so please forgive me if this is a really stupid question! 

 

I created a model which uses 18 variables in it, of which 9 are imputed variables, due to some columns having a high proportion of nulls.  Based on the scoring outputs of the test partition, the created model looked to be fairly predictive (50% of those which had the outcome I was trying to predict, featured in the 10% of scores, c80% in the top 20%). 

 

However, when I came to score some brand new data, whilst the top decile still performed ok (c.50% of the top decile had the outcome I was trying to predict vs. an overall 28%), there were large swathes of records with identical model scores which means some of the "middle" deciles are not performing as expected as they are smeared in the middle.  There are similar levels of nulls in this data too and it is these nulls and the fact that I have used imputed columns in the model creation which prompts my question.

 

When scoring new data - does Miner factor in the previously used impute, or do I need to feed the new data through an impute before scoring too?   If the former - could something else be wrong which is causing my issue? 

 

Below is my model diagram - the model is the flow on the right, from data all the way to scoring the test partition.  The flow on the left (starting highlighted yellow), is the new data I'm trying to score.

 

Any help, very gratefully received.  Thank you.

 

model capture.JPG

1 REPLY 1
MikeStockstill
SAS Employee

The Score node contains the imputation scoring code that was passed to it by the Impute node.  When new data is passed to the Score node for scoring, the imputation code is applied automatically to the new data.  

 

You can see exactly what the Score node code is going to do by viewing the score code itself. 

 

 - After the Score node finishes running, right-click the Score node, and select Results.

 - In the Results window, select View -> Scoring -> SAS Code.  This is the code that is used to score new data.

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

How to choose a machine learning algorithm

Use this tutorial as a handy guide to weigh the pros and cons of these commonly used machine learning algorithms.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 1 reply
  • 704 views
  • 0 likes
  • 2 in conversation