BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
RaviShanbhag
Calcite | Level 5

I have been looking under the hood at the SAS Text Mining Nodes expecting to find a TF-IDF (Term Frequency - Inverse Document Frequency Matrix) but with no luck. Does anyknow if SAS creates one and stores it somewhere?

Any pointers would be great.

1 ACCEPTED SOLUTION

Accepted Solutions
JamesCoxPhD
SAS Employee

Yep, it is stored in the emws folder for that workspace under the <nodename>_transaction data set.  So if this is the first text filter node in the first diagram for the project, it would be emws1.textfilter_transaction .

But the easiest way is just to go to the exported data property for the Text Filter node, and explore the transaction table there.

View solution in original post

4 REPLIES 4
JamesCoxPhD
SAS Employee

Absolutely, it does.  SAS TM allows you to set the TF parts of the TF-IDF weighting as log, binary or none, and the IDF part as entropy, inverse document frequency, mutual information, or none.  The transaction table that comes out of the Text Filter node includes the TF-IDF weightings for each parent term.

James A. Cox, Ph.D.

Text Mining Software Development Manager

SAS Institute, Inc.

The Power to Know

I wondered why the baseball kept getting bigger. Then it hit me.

RaviShanbhag
Calcite | Level 5

Thanks James for the response. Any idea where I can find the actual Matrix as a SAS dataset .. is it stored under any particular node?

JamesCoxPhD
SAS Employee

Yep, it is stored in the emws folder for that workspace under the <nodename>_transaction data set.  So if this is the first text filter node in the first diagram for the project, it would be emws1.textfilter_transaction .

But the easiest way is just to go to the exported data property for the Text Filter node, and explore the transaction table there.

hugo_viga
Calcite | Level 5

Hello ! Sorry for reviving this post but I'm new in the community and I'm having some problems dealing with this particular issue concerning the TF-IDF matrix.

The question is: how do I actually use the TF-IDF matrix as the input for the text cluster node? I mean, isn't that what is supposed to happen when you run the text filter node and generate the transaction table ? Does the text cluster node does this by default ?

My original post with this issue and pictures can be found here:

Thanks in advance!

SAS Innovate 2025: Save the Date

 SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!

Save the date!

How to choose a machine learning algorithm

Use this tutorial as a handy guide to weigh the pros and cons of these commonly used machine learning algorithms.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 4 replies
  • 4384 views
  • 5 likes
  • 3 in conversation