BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
stepthom
Calcite | Level 5

In SAS Enterprise Miner Workstation 13.2, I'm using some Text Mining nodes to build Text Topics.

 

However, I noticed lots of phrases and tokens tha I would like filtered out of the data before analysis. Examples include html tags such as "<p>", and boilerplate text such as "This description was written by the Martin Group." I tried adding these things to the list of stop words, but that didn't seem to help: the terms still appeared in the created topics.

 

Is there a way to filter out multi-word phrases? And is there a way to filter out regular expressions, such as "This description was written by .*"?

1 ACCEPTED SOLUTION

Accepted Solutions
M_Maldonado
Barite | Level 11

Half the people will recommend doing this transformations before importing data into EM, half the people will recommend doing it in EM.

If I was to add it on EM, I would do it on a transform node (use the SAS code ellipsis!), hptransform node, or in a SAS code node.

 

good luck!

View solution in original post

3 REPLIES 3
PGStats
Opal | Level 21

Regular expression matching is very flexible. There is almost certainly a way to do what you describe. But we need something more concrete to suggest good examples. Please give us a list of phrases that you would want to check and what you would expect as a result.

PG
stepthom
Calcite | Level 5

Hi PG Stats,

 

I think I can handle the construction of the regular expression, that's not a problem. My question was trying to ask, where do I put them? (Which node, which field?) I couldn't find it.

 

thanks!

M_Maldonado
Barite | Level 11

Half the people will recommend doing this transformations before importing data into EM, half the people will recommend doing it in EM.

If I was to add it on EM, I would do it on a transform node (use the SAS code ellipsis!), hptransform node, or in a SAS code node.

 

good luck!

sas-innovate-2024.png

Don't miss out on SAS Innovate - Register now for the FREE Livestream!

Can't make it to Vegas? No problem! Watch our general sessions LIVE or on-demand starting April 17th. Hear from SAS execs, best-selling author Adam Grant, Hot Ones host Sean Evans, top tech journalist Kara Swisher, AI expert Cassie Kozyrkov, and the mind-blowing dance crew iLuminate! Plus, get access to over 20 breakout sessions.

 

Register now!

How to choose a machine learning algorithm

Use this tutorial as a handy guide to weigh the pros and cons of these commonly used machine learning algorithms.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 3 replies
  • 1530 views
  • 0 likes
  • 3 in conversation