BookmarkSubscribeRSS Feed
KuniKuni
Fluorite | Level 6

hello!

I want to know HP Forest node choose what variable in that node.

output show me importance of variables but show too many variables.

I want to use 10 variables in HP Forest options, so set that option 10.

but it is still show me over 10 variables' importance.

 

where can I find variables accepted in HP Forest?

help me~~

1 REPLY 1
DougWielenga
SAS Employee

I want to use 10 variables in HP Forest options, so set that option 10 but it is still show me over 10 variables' importance.

 

I'm not sure which property you are describing.  There are two properties that relate to 'number of variables' in the HP Forest node. 

 

1 - Number of variables to consider in split search (property under Splitting Rule Options😞  this property only controls how many variables are selected for consideration for a given node.  Since different subsets of variables can be consider for every node, this still generates importance for more variables when all the nodes in all the trees are considered.

2 - Number of Variables to Consider (property under Score😞  this property is only active when you have chosen Random Branch Assignment for Variable Importance Method.  When you specify the number of variables (say k) in this property, certain variable importance information will only be computed for the top k variables but you will still see other variables that were chosen for splitting.

 

To limit the forest to only the top k variables, proceed as follow:

1 - Run the HP Forest node once using all possible inputs and view the results

2 - Identify the top k variables based on your preferred variable importance measure

3 - Close the results and click on the ... to the right of Variables property under Train 

4 - Change the Use column from Default to No for all input variables

5 - Change the Use column to Yes for the k variables of interest

6 - Rerun the HP Forest to obtain a forest which only relies on the desired variables

 

Please note that you are reducing the power of the forest by limiting the variables you use, so be sure to allow the number to be large enough to include all of the variable that have a substantial impact on the predictions.  You can test it out running different nodes in parallel to see how much model performance suffers as you drop the number of included variables.  

 

I hope this helps!

Doug

 

 

 

Ready to join fellow brilliant minds for the SAS Hackathon?

Build your skills. Make connections. Enjoy creative freedom. Maybe change the world. Registration is now open through August 30th. Visit the SAS Hackathon homepage.

Register today!
How to choose a machine learning algorithm

Use this tutorial as a handy guide to weigh the pros and cons of these commonly used machine learning algorithms.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 1 reply
  • 1159 views
  • 0 likes
  • 2 in conversation