BookmarkSubscribeRSS Feed
phopkinson
Obsidian | Level 7

Hi,

I'm currently using the smotesample action in SAS and encountering two unusual issues:

 

  1. Unexpected Categorical Values:
    Some of the newly generated categorical variables—are returning large negative values (e.g., -1000000) that do not exist in the original dataset. Could you clarify why this might happen? Specifically, how does SMOTE handle categorical features, and is there a recommended approach to ensure realistic synthetic values?

  2. Amplification of Placeholder Values:
    My dataset uses placeholder values like -1 or -2 to represent missing data. After applying SMOTE, these placeholders seem to become more dominant in the sampled data. Is there a way to prevent SMOTE from oversampling these values or to exclude them from the interpolation process?

Any guidance or best practices would be greatly appreciated.

1 REPLY 1
Kathryn_SAS
SAS Employee

Please include your SAS log that shows the code you are running and any messages you are getting. If you can provide sample data that would also be helpful.

hackathon24-white-horiz.png

2025 SAS Hackathon: There is still time!

Good news: We've extended SAS Hackathon registration until Sept. 12, so you still have time to be part of our biggest event yet – our five-year anniversary!

Register Now

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 1 reply
  • 1670 views
  • 1 like
  • 2 in conversation