BookmarkSubscribeRSS Feed
msf2021
Fluorite | Level 6

Hello!

 

I have built a random forest in SAS Miner for classification task. I have the variable Target (1=event, 0= non event) and i came along with top 20 variables more important. After that, i chose just this 20 and run again HPForest node, and all my metrics are ok between train (split 80%) and test (split 20%) but cumulative % captured response is significantly different between train (~30% in 1st decile) and test (~20% in 1st decile). I found that changing some parameters like mtry and maximum number of trees changes these results but is there a way i can find which are the optimal parameters? Trying different combinations by hand is not easy and I am not able to achieve good results.

 

I used already this methodology: Tip: Getting the Most from your Random Forest - SAS Support Communities but first it only considers interval inputs and i have interval and categorical ones and also, i cannot achieve better results with this approach...

 

Thanks

1 REPLY 1
sbxkoenk
SAS Super FREQ

Hello @msf2021 ,

 

What is the variable importance table / importance plot telling you?

 

Maybe the top 20 variables are only responsible for 50% of the total importance?

 

You can also have a look here :

SAS Tutorial | How to train forest models in SAS?
https://www.youtube.com/watch?v=FWragzNF59U

 

SAS Tutorial | How to Pick Hyperparameters of Machine Learning Models?

https://www.youtube.com/watch?v=AOR7XnCB_JA

 

You can also select the most important variables upfront with other techniques.

Not sure if the PROC VARREDUCE was already available in Enterprise Miner times(?).

 

Thanks,

Koen

Ready to join fellow brilliant minds for the SAS Hackathon?

Build your skills. Make connections. Enjoy creative freedom. Maybe change the world. Registration is now open through August 30th. Visit the SAS Hackathon homepage.

Register today!
How to choose a machine learning algorithm

Use this tutorial as a handy guide to weigh the pros and cons of these commonly used machine learning algorithms.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 1 reply
  • 2736 views
  • 0 likes
  • 2 in conversation