I have been looking under the hood at the SAS Text Mining Nodes expecting to find a TF-IDF (Term Frequency - Inverse Document Frequency Matrix) but with no luck. Does anyknow if SAS creates one and stores it somewhere?
Any pointers would be great.
Yep, it is stored in the emws folder for that workspace under the <nodename>_transaction data set. So if this is the first text filter node in the first diagram for the project, it would be emws1.textfilter_transaction .
But the easiest way is just to go to the exported data property for the Text Filter node, and explore the transaction table there.
Absolutely, it does. SAS TM allows you to set the TF parts of the TF-IDF weighting as log, binary or none, and the IDF part as entropy, inverse document frequency, mutual information, or none. The transaction table that comes out of the Text Filter node includes the TF-IDF weightings for each parent term.
James A. Cox, Ph.D.
Text Mining Software Development Manager
SAS Institute, Inc.
The Power to Know
I wondered why the baseball kept getting bigger. Then it hit me.
Thanks James for the response. Any idea where I can find the actual Matrix as a SAS dataset .. is it stored under any particular node?
Yep, it is stored in the emws folder for that workspace under the <nodename>_transaction data set. So if this is the first text filter node in the first diagram for the project, it would be emws1.textfilter_transaction .
But the easiest way is just to go to the exported data property for the Text Filter node, and explore the transaction table there.
Hello ! Sorry for reviving this post but I'm new in the community and I'm having some problems dealing with this particular issue concerning the TF-IDF matrix.
The question is: how do I actually use the TF-IDF matrix as the input for the text cluster node? I mean, isn't that what is supposed to happen when you run the text filter node and generate the transaction table ? Does the text cluster node does this by default ?
My original post with this issue and pictures can be found here:
Thanks in advance!
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Use this tutorial as a handy guide to weigh the pros and cons of these commonly used machine learning algorithms.
Find more tutorials on the SAS Users YouTube channel.