BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
RaviShanbhag
Calcite | Level 5

I have been looking under the hood at the SAS Text Mining Nodes expecting to find a TF-IDF (Term Frequency - Inverse Document Frequency Matrix) but with no luck. Does anyknow if SAS creates one and stores it somewhere?

Any pointers would be great.

1 ACCEPTED SOLUTION

Accepted Solutions
JamesCoxPhD
SAS Employee

Yep, it is stored in the emws folder for that workspace under the <nodename>_transaction data set.  So if this is the first text filter node in the first diagram for the project, it would be emws1.textfilter_transaction .

But the easiest way is just to go to the exported data property for the Text Filter node, and explore the transaction table there.

View solution in original post

4 REPLIES 4
JamesCoxPhD
SAS Employee

Absolutely, it does.  SAS TM allows you to set the TF parts of the TF-IDF weighting as log, binary or none, and the IDF part as entropy, inverse document frequency, mutual information, or none.  The transaction table that comes out of the Text Filter node includes the TF-IDF weightings for each parent term.

James A. Cox, Ph.D.

Text Mining Software Development Manager

SAS Institute, Inc.

The Power to Know

I wondered why the baseball kept getting bigger. Then it hit me.

RaviShanbhag
Calcite | Level 5

Thanks James for the response. Any idea where I can find the actual Matrix as a SAS dataset .. is it stored under any particular node?

JamesCoxPhD
SAS Employee

Yep, it is stored in the emws folder for that workspace under the <nodename>_transaction data set.  So if this is the first text filter node in the first diagram for the project, it would be emws1.textfilter_transaction .

But the easiest way is just to go to the exported data property for the Text Filter node, and explore the transaction table there.

hugo_viga
Calcite | Level 5

Hello ! Sorry for reviving this post but I'm new in the community and I'm having some problems dealing with this particular issue concerning the TF-IDF matrix.

The question is: how do I actually use the TF-IDF matrix as the input for the text cluster node? I mean, isn't that what is supposed to happen when you run the text filter node and generate the transaction table ? Does the text cluster node does this by default ?

My original post with this issue and pictures can be found here:

Thanks in advance!

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

How to choose a machine learning algorithm

Use this tutorial as a handy guide to weigh the pros and cons of these commonly used machine learning algorithms.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 4 replies
  • 3903 views
  • 5 likes
  • 3 in conversation