Hi all,
I've created a predictive model using HPFOREST and would like to know what are the best practices when scoring new data.
I can successfully use HP4SCORE to score new data, that's usually not an issue...except maybe if my new data contains categorical variableswith values that were not anywhere to be found in the training data.
For example, let's say I have a variable called 'Cellphone Model'.
When the model was trained, 'Cellphone Model' was taking the values 'iPhone 7', 'Sony Xperia' and 'Samsung Galaxy s8'.
This week, when scoring, I might see instances where this same 'Cellphone Model' input variable could also take the value 'Samsung Galaxy s9.'
My questions are:
Note that while I do have access to a SAS EM license, this model was created in SAS Base. I know how to handle this situation with the SAS EM nodes, but for some other technical reasons, I can't use the SAS Enterprise Miner nodes/diagram for this model.
Thanks!
Hello Charlot-
The HP4SCORE runs score code that was generated by PROC HPFOREST. Here is a documentation excerpt that tells about the HPFOREST handling:
Handling Values That Are Absent from Training Data
A splitting rule that uses a categorical variable might not recognize all possible values of the variable. Some
categories might not exist in the training data. Others might be so infrequent in the training sample in the node
that the procedure excludes them. The MINCATSIZE= option specifies the minimum number of occurrences
required for a categorical value to participate in the search for a splitting rule. Splitting rules assign unseen
categorical values to the branch that has the most in-bag training observations.
Have a good week!
Hi Mike,
Thank you so much. Not only you've answered the question perfectly, but I'm also pretty please with that that answer was!!
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Use this tutorial as a handy guide to weigh the pros and cons of these commonly used machine learning algorithms.
Find more tutorials on the SAS Users YouTube channel.