BookmarkSubscribeRSS Feed
Charlot
Fluorite | Level 6

Hi all,

 

I've created a predictive model using HPFOREST and would like to know what are the best practices when scoring new data.

 

I can successfully use HP4SCORE to score new data, that's usually not an issue...except maybe if my new data contains categorical variableswith values that were not anywhere to be found in the training data.

 

For example, let's say I have a variable called 'Cellphone Model'.

 

When the model was trained, 'Cellphone Model' was taking the values 'iPhone 7', 'Sony Xperia' and 'Samsung Galaxy s8'.

 

This week, when scoring, I might see instances where this same 'Cellphone Model' input variable could also take the value 'Samsung Galaxy s9.'

 

My questions are:

  1. What does HP4SCORE do in such situation? Is there an 'else' clause in the score code, so that it still generates valid scores if it encounters the variable 'Cellphone Model'?
  2. If HP4SCORE does not automatically handle such a situation, how should I handle this situation?

Note that while I do have access to a SAS EM license, this model was created in SAS Base. I know how to handle this situation with the SAS EM nodes, but for some other technical reasons, I can't use the SAS Enterprise Miner nodes/diagram for this model.

 

Thanks!

2 REPLIES 2
MikeStockstill
SAS Employee

Hello Charlot-

 

The HP4SCORE runs score code that was generated by PROC HPFOREST.  Here is a documentation excerpt that tells about the HPFOREST handling:

 

Handling Values That Are Absent from Training Data
A splitting rule that uses a categorical variable might not recognize all possible values of the variable. Some
categories might not exist in the training data. Others might be so infrequent in the training sample in the node
that the procedure excludes them. The MINCATSIZE= option specifies the minimum number of occurrences
required for a categorical value to participate in the search for a splitting rule. Splitting rules assign unseen
categorical values to the branch that has the most in-bag training observations.

 

Have a good week!

Charlot
Fluorite | Level 6

Hi Mike,

 

Thank you so much. Not only you've answered the question perfectly, but I'm also pretty please with that that answer was!!

sas-innovate-2024.png

Don't miss out on SAS Innovate - Register now for the FREE Livestream!

Can't make it to Vegas? No problem! Watch our general sessions LIVE or on-demand starting April 17th. Hear from SAS execs, best-selling author Adam Grant, Hot Ones host Sean Evans, top tech journalist Kara Swisher, AI expert Cassie Kozyrkov, and the mind-blowing dance crew iLuminate! Plus, get access to over 20 breakout sessions.

 

Register now!

How to choose a machine learning algorithm

Use this tutorial as a handy guide to weigh the pros and cons of these commonly used machine learning algorithms.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 2 replies
  • 825 views
  • 1 like
  • 2 in conversation