Hi,
I'm currently using the smotesample action in SAS and encountering two unusual issues:
Unexpected Categorical Values: Some of the newly generated categorical variables—are returning large negative values (e.g., -1000000) that do not exist in the original dataset. Could you clarify why this might happen? Specifically, how does SMOTE handle categorical features, and is there a recommended approach to ensure realistic synthetic values?
Amplification of Placeholder Values: My dataset uses placeholder values like -1 or -2 to represent missing data. After applying SMOTE, these placeholders seem to become more dominant in the sampled data. Is there a way to prevent SMOTE from oversampling these values or to exclude them from the interpolation process?
Any guidance or best practices would be greatly appreciated.
... View more