BookmarkSubscribeRSS Feed
eserates
Fluorite | Level 6

I am using proc hptmine. It generates some documents such as SVD matrices, docrpo, terms, parent, topics... etc.

I joined some of these tables to find the topic assigned for each document. Using term cutoff rate.

However I did not get the same results as the text miner does in E-miner.

Can anyone tell me how can assign a document to a particular text topic. there must be a formula using thresholds to do so.

Please help.

Thanks

6 REPLIES 6
RussAlbright
SAS Employee

Here is a quick summary:

 


For the U factor, it is number-of-terms by number-of-topics, calculate the mean and std deviation per column (topic) of the absolute value of each entry. I believe the default cutoff is 1 standard deviation above the mean. Set every value in abs value below that cutoff to zero. Now reform the document projections from your updated U. Now, repeat the procedure on that result as this time you will be doing it to documents.

 

Russ


Register today and join us virtually on June 16!
sasglobalforum.com | #SASGF

View now: on-demand content for SAS users

eserates
Fluorite | Level 6

thank you. I will try this and let you know if this works.

eserates
Fluorite | Level 6

hi again, thanks for your last response. it definitely help with progress. Although this is a good step towards finding topics assigned to documents; I still cannot match the same topics assigned by text miner in E-miner.

 

instead of using U matrix for the calculations you mentioned above I used the DOCPRO output from HPTMINE. Since this U matrix is the projection of terms onto documents. You think that was ok to use it then?

 

Second question is TOPICS dataset have termcutoff rates in the list. Can I use those rates in conjunction with V matrix whether those rates are above the rates in the V matrix? Or those cutoff rates need to be compared to some other values?

thanks for your help.

 

 

 

 

RussAlbright
SAS Employee

You have to truncate the U matrix using the technique i described then reform the docpro data set. PROC HPTMINE does not do this. The process has quite a few steps and may be a challenge to re-implement. Have you considered just saving out the sas code from your flow? Depending on what your trying to accomplish, this code will allow you to submit the whole flow programatically.

 

You would apply the termcutoffs to U, not V. U is number terms by number of topics. Then you reform docpro and then apply docutffs to docpro.


Register today and join us virtually on June 16!
sasglobalforum.com | #SASGF

View now: on-demand content for SAS users

eserates
Fluorite | Level 6

hi, 

last time you mentioned mprint option but that did not give me much idea about the code used by miner. there were many macros called. Are you talking about using the SAS Code node in the  e-miner that needs to be connected text topic node?

I have not tried that before. Can you please tell me how or where to find instructions on saving the flow code?

thanks

RussAlbright
SAS Employee

I am talking about built in macros that are called to do parts of the computation that the procedure does not do. If you right click on a node in your flow and choose "Export path to sas code" you can save the code that is run when your flow runs. If you look at that code you will see the names of these macros. Also, if you add

   options mprint;

when you run the path, you will see a printout of many of the macros executing. The actual source of these  macros  is not visible otherwise.

Russ 

 


Register today and join us virtually on June 16!
sasglobalforum.com | #SASGF

View now: on-demand content for SAS users

SAS Innovate 2025: Save the Date

 SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!

Save the date!

How to choose a machine learning algorithm

Use this tutorial as a handy guide to weigh the pros and cons of these commonly used machine learning algorithms.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 6 replies
  • 1505 views
  • 2 likes
  • 2 in conversation