BookmarkSubscribeRSS Feed
Charlot
Fluorite | Level 6

Hi all,

 

I've created a predictive model using HPFOREST and would like to know what are the best practices when scoring new data.

 

I can successfully use HP4SCORE to score new data, that's usually not an issue...except maybe if my new data contains categorical variableswith values that were not anywhere to be found in the training data.

 

For example, let's say I have a variable called 'Cellphone Model'.

 

When the model was trained, 'Cellphone Model' was taking the values 'iPhone 7', 'Sony Xperia' and 'Samsung Galaxy s8'.

 

This week, when scoring, I might see instances where this same 'Cellphone Model' input variable could also take the value 'Samsung Galaxy s9.'

 

My questions are:

  1. What does HP4SCORE do in such situation? Is there an 'else' clause in the score code, so that it still generates valid scores if it encounters the variable 'Cellphone Model'?
  2. If HP4SCORE does not automatically handle such a situation, how should I handle this situation?

Note that while I do have access to a SAS EM license, this model was created in SAS Base. I know how to handle this situation with the SAS EM nodes, but for some other technical reasons, I can't use the SAS Enterprise Miner nodes/diagram for this model.

 

Thanks!

2 REPLIES 2
MikeStockstill
SAS Employee

Hello Charlot-

 

The HP4SCORE runs score code that was generated by PROC HPFOREST.  Here is a documentation excerpt that tells about the HPFOREST handling:

 

Handling Values That Are Absent from Training Data
A splitting rule that uses a categorical variable might not recognize all possible values of the variable. Some
categories might not exist in the training data. Others might be so infrequent in the training sample in the node
that the procedure excludes them. The MINCATSIZE= option specifies the minimum number of occurrences
required for a categorical value to participate in the search for a splitting rule. Splitting rules assign unseen
categorical values to the branch that has the most in-bag training observations.

 

Have a good week!

Charlot
Fluorite | Level 6

Hi Mike,

 

Thank you so much. Not only you've answered the question perfectly, but I'm also pretty please with that that answer was!!

sas-innovate-2024.png

Available on demand!

Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.

 

Register now!

How to choose a machine learning algorithm

Use this tutorial as a handy guide to weigh the pros and cons of these commonly used machine learning algorithms.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 2 replies
  • 905 views
  • 1 like
  • 2 in conversation