Building models with SAS Enterprise Miner, SAS Factory Miner, SAS Visual Data Mining and Machine Learning or just with programming

TF-IDF in SAS Text Miner

Accepted Solution Solved
Reply
New Contributor
Posts: 2
Accepted Solution

TF-IDF in SAS Text Miner

I have been looking under the hood at the SAS Text Mining Nodes expecting to find a TF-IDF (Term Frequency - Inverse Document Frequency Matrix) but with no luck. Does anyknow if SAS creates one and stores it somewhere?

Any pointers would be great.


Accepted Solutions
Solution
‎02-19-2015 01:53 PM
SAS Employee
Posts: 12

Re: TF-IDF in SAS Text Miner

Yep, it is stored in the emws folder for that workspace under the <nodename>_transaction data set.  So if this is the first text filter node in the first diagram for the project, it would be emws1.textfilter_transaction .

But the easiest way is just to go to the exported data property for the Text Filter node, and explore the transaction table there.

View solution in original post


All Replies
SAS Employee
Posts: 12

Re: TF-IDF in SAS Text Miner

Absolutely, it does.  SAS TM allows you to set the TF parts of the TF-IDF weighting as log, binary or none, and the IDF part as entropy, inverse document frequency, mutual information, or none.  The transaction table that comes out of the Text Filter node includes the TF-IDF weightings for each parent term.

James A. Cox, Ph.D.

Text Mining Software Development Manager

SAS Institute, Inc.

The Power to Know

I wondered why the baseball kept getting bigger. Then it hit me.

New Contributor
Posts: 2

Re: TF-IDF in SAS Text Miner

Thanks James for the response. Any idea where I can find the actual Matrix as a SAS dataset .. is it stored under any particular node?

Solution
‎02-19-2015 01:53 PM
SAS Employee
Posts: 12

Re: TF-IDF in SAS Text Miner

Yep, it is stored in the emws folder for that workspace under the <nodename>_transaction data set.  So if this is the first text filter node in the first diagram for the project, it would be emws1.textfilter_transaction .

But the easiest way is just to go to the exported data property for the Text Filter node, and explore the transaction table there.

New Contributor
Posts: 2

Re: TF-IDF in SAS Text Miner

Hello ! Sorry for reviving this post but I'm new in the community and I'm having some problems dealing with this particular issue concerning the TF-IDF matrix.

The question is: how do I actually use the TF-IDF matrix as the input for the text cluster node? I mean, isn't that what is supposed to happen when you run the text filter node and generate the transaction table ? Does the text cluster node does this by default ?

My original post with this issue and pictures can be found here:

Thanks in advance!

☑ This topic is SOLVED.

Need further help from the community? Please ask a new question.

Discussion stats
  • 4 replies
  • 1109 views
  • 5 likes
  • 3 in conversation