BookmarkSubscribeRSS Feed
msf2021
Fluorite | Level 6

Hello!

 

I have built a random forest in SAS Miner for classification task. I have the variable Target (1=event, 0= non event) and i came along with top 20 variables more important. After that, i chose just this 20 and run again HPForest node, and all my metrics are ok between train (split 80%) and test (split 20%) but cumulative % captured response is significantly different between train (~30% in 1st decile) and test (~20% in 1st decile). I found that changing some parameters like mtry and maximum number of trees changes these results but is there a way i can find which are the optimal parameters? Trying different combinations by hand is not easy and I am not able to achieve good results.

 

I used already this methodology: Tip: Getting the Most from your Random Forest - SAS Support Communities but first it only considers interval inputs and i have interval and categorical ones and also, i cannot achieve better results with this approach...

 

Thanks

1 REPLY 1
sbxkoenk
SAS Super FREQ

Hello @msf2021 ,

 

What is the variable importance table / importance plot telling you?

 

Maybe the top 20 variables are only responsible for 50% of the total importance?

 

You can also have a look here :

SAS Tutorial | How to train forest models in SAS?
https://www.youtube.com/watch?v=FWragzNF59U

 

SAS Tutorial | How to Pick Hyperparameters of Machine Learning Models?

https://www.youtube.com/watch?v=AOR7XnCB_JA

 

You can also select the most important variables upfront with other techniques.

Not sure if the PROC VARREDUCE was already available in Enterprise Miner times(?).

 

Thanks,

Koen

sas-innovate-2024.png

Available on demand!

Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.

 

Register now!

How to choose a machine learning algorithm

Use this tutorial as a handy guide to weigh the pros and cons of these commonly used machine learning algorithms.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 1 reply
  • 2363 views
  • 0 likes
  • 2 in conversation