I have been looking under the hood at the SAS Text Mining Nodes expecting to find a TF-IDF (Term Frequency - Inverse Document Frequency Matrix) but with no luck. Does anyknow if SAS creates one and stores it somewhere?
Any pointers would be great.
Yep, it is stored in the emws folder for that workspace under the <nodename>_transaction data set. So if this is the first text filter node in the first diagram for the project, it would be emws1.textfilter_transaction .
But the easiest way is just to go to the exported data property for the Text Filter node, and explore the transaction table there.
Absolutely, it does. SAS TM allows you to set the TF parts of the TF-IDF weighting as log, binary or none, and the IDF part as entropy, inverse document frequency, mutual information, or none. The transaction table that comes out of the Text Filter node includes the TF-IDF weightings for each parent term.
James A. Cox, Ph.D.
Text Mining Software Development Manager
SAS Institute, Inc.
The Power to Know
I wondered why the baseball kept getting bigger. Then it hit me.
Thanks James for the response. Any idea where I can find the actual Matrix as a SAS dataset .. is it stored under any particular node?
Yep, it is stored in the emws folder for that workspace under the <nodename>_transaction data set. So if this is the first text filter node in the first diagram for the project, it would be emws1.textfilter_transaction .
But the easiest way is just to go to the exported data property for the Text Filter node, and explore the transaction table there.
Hello ! Sorry for reviving this post but I'm new in the community and I'm having some problems dealing with this particular issue concerning the TF-IDF matrix.
The question is: how do I actually use the TF-IDF matrix as the input for the text cluster node? I mean, isn't that what is supposed to happen when you run the text filter node and generate the transaction table ? Does the text cluster node does this by default ?
My original post with this issue and pictures can be found here:
Thanks in advance!
Don't miss out on SAS Innovate - Register now for the FREE Livestream!
Can't make it to Vegas? No problem! Watch our general sessions LIVE or on-demand starting April 17th. Hear from SAS execs, best-selling author Adam Grant, Hot Ones host Sean Evans, top tech journalist Kara Swisher, AI expert Cassie Kozyrkov, and the mind-blowing dance crew iLuminate! Plus, get access to over 20 breakout sessions.
Use this tutorial as a handy guide to weigh the pros and cons of these commonly used machine learning algorithms.
Find more tutorials on the SAS Users YouTube channel.