Hi,
I'm currently using the smotesample action in SAS and encountering two unusual issues:
-
Unexpected Categorical Values:
Some of the newly generated categorical variables—are returning large negative values (e.g., -1000000) that do not exist in the original dataset. Could you clarify why this might happen? Specifically, how does SMOTE handle categorical features, and is there a recommended approach to ensure realistic synthetic values?
-
Amplification of Placeholder Values:
My dataset uses placeholder values like -1 or -2 to represent missing data. After applying SMOTE, these placeholders seem to become more dominant in the sampled data. Is there a way to prevent SMOTE from oversampling these values or to exclude them from the interpolation process?
Any guidance or best practices would be greatly appreciated.