Building models with SAS Enterprise Miner, SAS Factory Miner, SAS Visual Data Mining and Machine Learning or just with programming

How to get TFIDF table from Text Parser

Reply
New Contributor
Posts: 4

How to get TFIDF table from Text Parser

I am using SAS EM 12.1 version now. I would like to get term frequency inversed document frequency table from Text Parser.

I can see that the exported transaction dataset is the table. It has three column: term index column, document index column, and the weight for that term in that document.  However, each term is represented as an index, not the actual word.

Is there a way to find the look up table to map each term's index with the actual word?

SAS Employee
Posts: 12

Re: How to get TFIDF table from Text Parser


You are exactly right, the transaction table is the TFIDF table.  If you want to see it as term|role combinations, you can do something like the following with code or in code node (assume that this is on the first diagram, and the first text filter node on that diagram:

%let filternode_name=emws1.textfilternode;

%let viewname=<whatever data set you want to create>;

   proc sql noprint;

      create view &viewname as

       select ktrim(term) || '|' || role as _item_, b.*

       from &filternode_name._term_strings as a, &filternode_name._out_parent as b

       where b._termnum_=a.key;

         quit;

New Contributor
Posts: 4

Re: How to get TFIDF table from Text Parser

Great. I use proc contents and find many more datasets.

But I have a follow-up question. I got great result when using SVM model based on TFIDF matrix as the input variables for classification purposes.

Now I need  a scoring dataset, that will go through parsing and filtering. But I do not see a way to get the TFIDF matrix based on the score dataset, which will be subsequently used by SVM. This is because there is only one transaction dataset out of the text filter node. Is this doable?

tfidf.jpg

SAS Employee
Posts: 12

Re: How to get TFIDF table from Text Parser

the <nodename>_validout and _testout tables contain the tfidf weightings for the validation and test set respectively.

New Contributor
Posts: 2

Re: How to get TFIDF table from Text Parser / Filter

None of the data sets seem to be the same in SAS EM 13.1, any hints on where are links between the nodes? I can see several possibilities but never played SAS at this level.

 

Jacob

New Contributor
Posts: 2

Re: How to get TFIDF table from Text Parser / Filter

Actually the answer was in the tiny picture attached to one of the previous messages. The TF-IDF matrx, in its sparse representation, can be found in the TRANSACTION data set returned from the Text Filter, providing the weights have been set to TF-IDF. Jacob

Ask a Question
Discussion stats
  • 5 replies
  • 987 views
  • 0 likes
  • 3 in conversation