Building models with SAS Enterprise Miner, SAS Factory Miner, SAS Visual Data Mining and Machine Learning or just with programming

How to filter phrases and regular expressions?

Accepted Solution Solved
Reply
New Contributor
Posts: 2
Accepted Solution

How to filter phrases and regular expressions?

In SAS Enterprise Miner Workstation 13.2, I'm using some Text Mining nodes to build Text Topics.

 

However, I noticed lots of phrases and tokens tha I would like filtered out of the data before analysis. Examples include html tags such as "<p>", and boilerplate text such as "This description was written by the Martin Group." I tried adding these things to the list of stop words, but that didn't seem to help: the terms still appeared in the created topics.

 

Is there a way to filter out multi-word phrases? And is there a way to filter out regular expressions, such as "This description was written by .*"?


Accepted Solutions
Solution
‎01-18-2016 04:41 PM
Super Contributor
Posts: 337

Re: How to filter phrases and regular expressions?

Half the people will recommend doing this transformations before importing data into EM, half the people will recommend doing it in EM.

If I was to add it on EM, I would do it on a transform node (use the SAS code ellipsis!), hptransform node, or in a SAS code node.

 

good luck!

View solution in original post


All Replies
Respected Advisor
Posts: 4,919

Re: How to filter phrases and regular expressions?

Regular expression matching is very flexible. There is almost certainly a way to do what you describe. But we need something more concrete to suggest good examples. Please give us a list of phrases that you would want to check and what you would expect as a result.

PG
New Contributor
Posts: 2

Re: How to filter phrases and regular expressions?

Hi PG Stats,

 

I think I can handle the construction of the regular expression, that's not a problem. My question was trying to ask, where do I put them? (Which node, which field?) I couldn't find it.

 

thanks!

Solution
‎01-18-2016 04:41 PM
Super Contributor
Posts: 337

Re: How to filter phrases and regular expressions?

Half the people will recommend doing this transformations before importing data into EM, half the people will recommend doing it in EM.

If I was to add it on EM, I would do it on a transform node (use the SAS code ellipsis!), hptransform node, or in a SAS code node.

 

good luck!

☑ This topic is solved.

Need further help from the community? Please ask a new question.

Discussion stats
  • 3 replies
  • 342 views
  • 0 likes
  • 3 in conversation