- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Hello!
I have built a random forest in SAS Miner for classification task. I have the variable Target (1=event, 0= non event) and i came along with top 20 variables more important. After that, i chose just this 20 and run again HPForest node, and all my metrics are ok between train (split 80%) and test (split 20%) but cumulative % captured response is significantly different between train (~30% in 1st decile) and test (~20% in 1st decile). I found that changing some parameters like mtry and maximum number of trees changes these results but is there a way i can find which are the optimal parameters? Trying different combinations by hand is not easy and I am not able to achieve good results.
I used already this methodology: Tip: Getting the Most from your Random Forest - SAS Support Communities but first it only considers interval inputs and i have interval and categorical ones and also, i cannot achieve better results with this approach...
Thanks
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Hello @msf2021 ,
What is the variable importance table / importance plot telling you?
Maybe the top 20 variables are only responsible for 50% of the total importance?
You can also have a look here :
SAS Tutorial | How to train forest models in SAS?
https://www.youtube.com/watch?v=FWragzNF59U
SAS Tutorial | How to Pick Hyperparameters of Machine Learning Models?
https://www.youtube.com/watch?v=AOR7XnCB_JA
You can also select the most important variables upfront with other techniques.
Not sure if the PROC VARREDUCE was already available in Enterprise Miner times(?).
Thanks,
Koen