Hi,
I was wondering if anyone can help me?
Is this possible or right to use standardized data with random forest and decision trees?
If use with standardized data how algorithm treats that data?
Regards
I don't think it hurts that you have already standardized. With the tree-based algorithms, interval inputs are typically binned anyway (using bucket or equal-spaced binnin) before doing the split search, so it should be fine.
Typically, you do not have to standardize data (z-score) with tree models (decision tree, random forest, gradient boosting etc.) as the algorithm tries to split at a place where classification/prediction is the best based on some criteria. Also, when you standardize, the interpretability gets little harder -- instead of saying age > 25 years is a good split, you have to say age > 1 std dev away from the mean is a good split etc. So for tree based models, I say, when you don't need it, why do the extra work.
Hope this helps,
Radhikha
Here is a really helpful article about standardizing that was just posted:
I don't think it hurts that you have already standardized. With the tree-based algorithms, interval inputs are typically binned anyway (using bucket or equal-spaced binnin) before doing the split search, so it should be fine.
Don't miss out on SAS Innovate - Register now for the FREE Livestream!
Can't make it to Vegas? No problem! Watch our general sessions LIVE or on-demand starting April 17th. Hear from SAS execs, best-selling author Adam Grant, Hot Ones host Sean Evans, top tech journalist Kara Swisher, AI expert Cassie Kozyrkov, and the mind-blowing dance crew iLuminate! Plus, get access to over 20 breakout sessions.
Use this tutorial as a handy guide to weigh the pros and cons of these commonly used machine learning algorithms.
Find more tutorials on the SAS Users YouTube channel.