My question pertains to the following subject matter:
Course = AI and Machine Learning Professional
Module= Machine Learning Specialist
Lession = Lesson 3 Decision Trees and Ensembles of Trees.
The first question: When scoring data using an ensemble of trees, is the entire validation data set scored by each of the individual trees in the ensemble?
The second question is: If my training dataset has 1000 points, will each "bagging" sample (sampling done with replacement) used to build a tree in the ensemble contain 1000 data points? or is this an hyperparameter that the statistician can set within Model Studio? A follow-up question then becomes if my original dataset contains 400 data points, may a bagging sample, drawn from the original 400, contain more than 400 points and be mathematically defensiveable?
The third question is: Is the only difference between bagging and boosting is how the sample is selected for each tree in the ensemble. For bagging, the sample is with replacement for boosting the sampling is based on weights. But, in the end, each method when applied to the same original dataset of size 500, will produce samples of size 500?
Thank you,
Bill Donaldson