@DougWielenga wrote: The code you found uses the MODECLUS procedure which (as you pointed out) is intended for numerical data. It also has the problem of not being able to scale to the size of typical data mining data sets. The Cluster node in SAS Enterprise Miner does allow for using categorical variables in creating a cluster solution and is capable of handling large scale data. Therefore, you might consider creating clusters with the Cluster node and then sampling from the segments it produces as desired to achieve a similar effect. Hey Doug, Could you explain in more details how can we use the output of the cluster node to include it into SMOTE SAS code? I think I don't understand the idea. I found this article about the method that allows categorical variables but there is only pseudocode provided: http://support.sas.com/resources/papers/proceedings15/3483-2015.pdf Any ideas how it could be implemented using SAS code?
... View more