BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
stepthom
Calcite | Level 5

In SAS Enterprise Miner Workstation 13.2, I'm using some Text Mining nodes to build Text Topics.

 

However, I noticed lots of phrases and tokens tha I would like filtered out of the data before analysis. Examples include html tags such as "<p>", and boilerplate text such as "This description was written by the Martin Group." I tried adding these things to the list of stop words, but that didn't seem to help: the terms still appeared in the created topics.

 

Is there a way to filter out multi-word phrases? And is there a way to filter out regular expressions, such as "This description was written by .*"?

1 ACCEPTED SOLUTION

Accepted Solutions
M_Maldonado
Barite | Level 11

Half the people will recommend doing this transformations before importing data into EM, half the people will recommend doing it in EM.

If I was to add it on EM, I would do it on a transform node (use the SAS code ellipsis!), hptransform node, or in a SAS code node.

 

good luck!

View solution in original post

3 REPLIES 3
PGStats
Opal | Level 21

Regular expression matching is very flexible. There is almost certainly a way to do what you describe. But we need something more concrete to suggest good examples. Please give us a list of phrases that you would want to check and what you would expect as a result.

PG
stepthom
Calcite | Level 5

Hi PG Stats,

 

I think I can handle the construction of the regular expression, that's not a problem. My question was trying to ask, where do I put them? (Which node, which field?) I couldn't find it.

 

thanks!

M_Maldonado
Barite | Level 11

Half the people will recommend doing this transformations before importing data into EM, half the people will recommend doing it in EM.

If I was to add it on EM, I would do it on a transform node (use the SAS code ellipsis!), hptransform node, or in a SAS code node.

 

good luck!

Ready to join fellow brilliant minds for the SAS Hackathon?

Build your skills. Make connections. Enjoy creative freedom. Maybe change the world. Registration is now open through August 30th. Visit the SAS Hackathon homepage.

Register today!
How to choose a machine learning algorithm

Use this tutorial as a handy guide to weigh the pros and cons of these commonly used machine learning algorithms.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 3 replies
  • 1741 views
  • 0 likes
  • 3 in conversation