SAS Data Science

Building models with SAS Enterprise Miner, SAS Factory Miner, SAS Viya (Machine Learning), SAS Visual Text Analytics, with point-and-click interfaces or programming
BookmarkSubscribeRSS Feed
hugo_viga
Calcite | Level 5

Hello everyone! I'm currently using SAS Enterprise Miner 12.1 and running into some trouble about how to procede.

I have a data set consisting of tweets, and I intend to create clusters from the information I collected. So far, I've cleaned the data and built a diagram like this:

diagrama.PNG

I also understand that the TF-IDF matrix can be found in the "exported data" option of the Text filter node

(found about it in these 2 other discussion posts

   )

Looks like this:

matriz_tfidf.PNG

Is this it??

So the question is:

Assuming this is the matrix I need to input to the clustering node as the features vector to perform the clustering algorithm, by simply running the Text Cluster node, will it assume the TF-IDF matrix by default or do I have to change the input somehow? And change the node configuration itself?


In the text filter node I set the Frequency weighting to LOG and the Term weight to IDF.

Thanks in advance!

1 REPLY 1
JasonXin
SAS Employee

hugo_viga ,

 

First, thanks  for using SAS. My name is Jason Xin, advanced analytics solution architect working at SAS Institute.

 

In EM, the Text Parsing node  does all the heavy duty initial work ending in frequency matrix. Text Filer node essentially is where most machine-human interaction, subsetting, trimming terms, keep/drop, viewing sterms,... happens. Although the content has been massaged this and that, and certainly exported data sets appear different, the essence remains frequency matrix /query matrix.

 

In rare cases one benefits from clustering directly on count matrix. In most cases, which I suspect includes your case, you would engage SVD as input into text clustering. I cannot find a machine that runs 12.1. I recall SVD back in 12.1 inside Text Cluster node,  the same as 14.1 that I am running now. So the answer to your question is just to connect the TF node to TC node and configure SVD there.

 

Hope this helps. Best Regards

Jason Xin

sas-innovate-white.png

Special offer for SAS Communities members

Save $250 on SAS Innovate and get a free advance copy of the new SAS For Dummies book! Use the code "SASforDummies" to register. Don't miss out, May 6-9, in Orlando, Florida.

 

View the full agenda.

Register now!

How to choose a machine learning algorithm

Use this tutorial as a handy guide to weigh the pros and cons of these commonly used machine learning algorithms.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 1 reply
  • 1757 views
  • 0 likes
  • 2 in conversation