Hi,
I'm currently using the smotesample action in SAS and encountering two unusual issues:
Unexpected Categorical Values:
Some of the newly generated categorical variables—are returning large negative values (e.g., -1000000) that do not exist in the original dataset. Could you clarify why this might happen? Specifically, how does SMOTE handle categorical features, and is there a recommended approach to ensure realistic synthetic values?
Amplification of Placeholder Values:
My dataset uses placeholder values like -1 or -2 to represent missing data. After applying SMOTE, these placeholders seem to become more dominant in the sampled data. Is there a way to prevent SMOTE from oversampling these values or to exclude them from the interpolation process?
Any guidance or best practices would be greatly appreciated.
Please include your SAS log that shows the code you are running and any messages you are getting. If you can provide sample data that would also be helpful.
It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.
