Keep the same amount of IDs, yes. In this example we have 2 IDs with target 1 and 5 IDs with target 0, so it is a not balanced dataset based on the target variable. My original dataset is composed by 1142 IDs with target 1 and 8395 IDs with target 0. I want to keep the dataset as big as possible, so, to keep the same amount of IDs for each value of the target variable, the output would be, for example, 2 IDs with target 1 (which are in disadvantage) and 2 IDs with target 0. And I said randomly because there are no further rules to filter who with target 1 is being kept.
... View more