BookmarkSubscribeRSS Feed
hugo_viga
Calcite | Level 5

Hello everyone! I'm currently using SAS Enterprise Miner 12.1 and running into some trouble about how to procede.

I have a data set consisting of tweets, and I intend to create clusters from the information I collected. So far, I've cleaned the data and built a diagram like this:

diagrama.PNG

I also understand that the TF-IDF matrix can be found in the "exported data" option of the Text filter node

(found about it in these 2 other discussion posts

   )

Looks like this:

matriz_tfidf.PNG

Is this it??

So the question is:

Assuming this is the matrix I need to input to the clustering node as the features vector to perform the clustering algorithm, by simply running the Text Cluster node, will it assume the TF-IDF matrix by default or do I have to change the input somehow? And change the node configuration itself?


In the text filter node I set the Frequency weighting to LOG and the Term weight to IDF.

Thanks in advance!

1 REPLY 1
JasonXin
SAS Employee

hugo_viga ,

 

First, thanks  for using SAS. My name is Jason Xin, advanced analytics solution architect working at SAS Institute.

 

In EM, the Text Parsing node  does all the heavy duty initial work ending in frequency matrix. Text Filer node essentially is where most machine-human interaction, subsetting, trimming terms, keep/drop, viewing sterms,... happens. Although the content has been massaged this and that, and certainly exported data sets appear different, the essence remains frequency matrix /query matrix.

 

In rare cases one benefits from clustering directly on count matrix. In most cases, which I suspect includes your case, you would engage SVD as input into text clustering. I cannot find a machine that runs 12.1. I recall SVD back in 12.1 inside Text Cluster node,  the same as 14.1 that I am running now. So the answer to your question is just to connect the TF node to TC node and configure SVD there.

 

Hope this helps. Best Regards

Jason Xin

sas-innovate-2024.png

Available on demand!

Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.

 

Register now!

How to choose a machine learning algorithm

Use this tutorial as a handy guide to weigh the pros and cons of these commonly used machine learning algorithms.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 1 reply
  • 1397 views
  • 0 likes
  • 2 in conversation