I am using proc hptmine. It generates some documents such as SVD matrices, docrpo, terms, parent, topics... etc.
I joined some of these tables to find the topic assigned for each document. Using term cutoff rate.
However I did not get the same results as the text miner does in E-miner.
Can anyone tell me how can assign a document to a particular text topic. there must be a formula using thresholds to do so.
Please help.
Thanks
Here is a quick summary:
For the U factor, it is number-of-terms by number-of-topics, calculate the mean and std deviation per column (topic) of the absolute value of each entry. I believe the default cutoff is 1 standard deviation above the mean. Set every value in abs value below that cutoff to zero. Now reform the document projections from your updated U. Now, repeat the procedure on that result as this time you will be doing it to documents.
Russ
Register today and join us virtually on June 16!
sasglobalforum.com | #SASGF
View now: on-demand content for SAS users
thank you. I will try this and let you know if this works.
hi again, thanks for your last response. it definitely help with progress. Although this is a good step towards finding topics assigned to documents; I still cannot match the same topics assigned by text miner in E-miner.
instead of using U matrix for the calculations you mentioned above I used the DOCPRO output from HPTMINE. Since this U matrix is the projection of terms onto documents. You think that was ok to use it then?
Second question is TOPICS dataset have termcutoff rates in the list. Can I use those rates in conjunction with V matrix whether those rates are above the rates in the V matrix? Or those cutoff rates need to be compared to some other values?
thanks for your help.
You have to truncate the U matrix using the technique i described then reform the docpro data set. PROC HPTMINE does not do this. The process has quite a few steps and may be a challenge to re-implement. Have you considered just saving out the sas code from your flow? Depending on what your trying to accomplish, this code will allow you to submit the whole flow programatically.
You would apply the termcutoffs to U, not V. U is number terms by number of topics. Then you reform docpro and then apply docutffs to docpro.
Register today and join us virtually on June 16!
sasglobalforum.com | #SASGF
View now: on-demand content for SAS users
hi,
last time you mentioned mprint option but that did not give me much idea about the code used by miner. there were many macros called. Are you talking about using the SAS Code node in the e-miner that needs to be connected text topic node?
I have not tried that before. Can you please tell me how or where to find instructions on saving the flow code?
thanks
I am talking about built in macros that are called to do parts of the computation that the procedure does not do. If you right click on a node in your flow and choose "Export path to sas code" you can save the code that is run when your flow runs. If you look at that code you will see the names of these macros. Also, if you add
options mprint;
when you run the path, you will see a printout of many of the macros executing. The actual source of these macros is not visible otherwise.
Russ
Register today and join us virtually on June 16!
sasglobalforum.com | #SASGF
View now: on-demand content for SAS users
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Use this tutorial as a handy guide to weigh the pros and cons of these commonly used machine learning algorithms.
Find more tutorials on the SAS Users YouTube channel.