BookmarkSubscribeRSS Feed
hugo_viga
Calcite | Level 5

Hello everyone! I'm currently using SAS Enterprise Miner 12.1 and running into some trouble about how to procede.

I have a data set consisting of tweets, and I intend to create clusters from the information I collected. So far, I've cleaned the data and built a diagram like this:

diagrama.PNG

I also understand that the TF-IDF matrix can be found in the "exported data" option of the Text filter node

(found about it in these 2 other discussion posts

   )

Looks like this:

matriz_tfidf.PNG

Is this it??

So the question is:

Assuming this is the matrix I need to input to the clustering node as the features vector to perform the clustering algorithm, by simply running the Text Cluster node, will it assume the TF-IDF matrix by default or do I have to change the input somehow? And change the node configuration itself?


In the text filter node I set the Frequency weighting to LOG and the Term weight to IDF.

Thanks in advance!

1 REPLY 1
JasonXin
SAS Employee

hugo_viga ,

 

First, thanks  for using SAS. My name is Jason Xin, advanced analytics solution architect working at SAS Institute.

 

In EM, the Text Parsing node  does all the heavy duty initial work ending in frequency matrix. Text Filer node essentially is where most machine-human interaction, subsetting, trimming terms, keep/drop, viewing sterms,... happens. Although the content has been massaged this and that, and certainly exported data sets appear different, the essence remains frequency matrix /query matrix.

 

In rare cases one benefits from clustering directly on count matrix. In most cases, which I suspect includes your case, you would engage SVD as input into text clustering. I cannot find a machine that runs 12.1. I recall SVD back in 12.1 inside Text Cluster node,  the same as 14.1 that I am running now. So the answer to your question is just to connect the TF node to TC node and configure SVD there.

 

Hope this helps. Best Regards

Jason Xin

sas-innovate-2024.png

Don't miss out on SAS Innovate - Register now for the FREE Livestream!

Can't make it to Vegas? No problem! Watch our general sessions LIVE or on-demand starting April 17th. Hear from SAS execs, best-selling author Adam Grant, Hot Ones host Sean Evans, top tech journalist Kara Swisher, AI expert Cassie Kozyrkov, and the mind-blowing dance crew iLuminate! Plus, get access to over 20 breakout sessions.

 

Register now!

How to choose a machine learning algorithm

Use this tutorial as a handy guide to weigh the pros and cons of these commonly used machine learning algorithms.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 1 reply
  • 1337 views
  • 0 likes
  • 2 in conversation