Text mining and content categorization

Remove synonims in TextCluster node on Enterprise Miner.

Reply
Frequent Learner
Posts: 1

Remove synonims in TextCluster node on Enterprise Miner.

[ Edited ]

Hi i'm using Enterprise Miner to classify a quite large number of articles using their keywords. 

So my purpose is to clustering them on the base of their keywords similarity.

In the editor of TextFilter node i aggregated some keywords that had almost the same meaning (i.e i aggregate text_analysis and texture_analysis) and i saved the changes.

I would expect that from this moment sas would treat the 2 words as 1 but when i run the TextCluster node i see that in the cluster's descriptive terms both +text_analysis and texture_analysis appear. 

How can i exlude from the list texture_analysis which is already contained in +text_analysis? 

 

Thank you in advance

SAS Employee
Posts: 28

Re: Remove synonims in TextCluster node on Enterprise Miner.

Only the kept terms are chosen as descriptive terms so it is more likely that the mapping isnt happening as you would like. Try setting the synonyms in the parse node and rerunning. Double check the terms in the filter viewer that they are mapped as expected and rerun the clustering after that.

 

If part-of-speech tagging is on, another alternative explanation is that "texture_analysis" is being tagged in several ways and you see in the descriptive terms a version that wasn't mapped to "text_analysis". The descriptive terms entry in the data set does not show the  part-of-speech tag so it is possible for the same term (without the tag) to show up in different ways in that descriptive term report becuase it occurred multiple times and with different part-of-speech tags.

 

 

 

Russ

Ask a Question
Discussion stats
  • 1 reply
  • 232 views
  • 0 likes
  • 2 in conversation