In SAS Enterprise Miner Workstation 13.2, I'm using some Text Mining nodes to build Text Topics. However, I noticed lots of phrases and tokens tha I would like filtered out of the data before analysis. Examples include html tags such as "<p>", and boilerplate text such as "This description was written by the Martin Group." I tried adding these things to the list of stop words, but that didn't seem to help: the terms still appeared in the created topics. Is there a way to filter out multi-word phrases? And is there a way to filter out regular expressions, such as "This description was written by .*"?
... View more