Hi,
I've a SAS Text Topic node that has a list of user defined topics. I have a set of documents adhering to the user defined topics in my result set from the TT node. I've glanced the data and apparently I found that I need to write a condition that uses Negation (~) operator for obtaining even more accurate results. To add more clarity to this, I am giving you this example. I've a topic called TOPIC_SODA and the term that identifies that topic is SODA. I've got the result set that has the term SODA in it. In the result set I found that there are set of documents that has SODA combined with one more term called SCOTCH and I don't want that document to appear under my topic(TOPIC_SODA) in the result set. So I decided to use my condition in the term classification as SODA & ~SCOTCH. Unfortunately I was not able to use this type of condition in the TT user topic declaration. Is there any node that I can use to feed user defined topic which is based on a condition to the Text Topic node? I need the exact result to form my custom topic using the condition that I've given above and use that in the Text Topic node to filter the documents. I've added the snippets that shows my result set and failed approach.
Once you have the SAS code mode running successfully, examine the data exported from that node. It will have the document*topics matrix along with any new topics you created in the SAS code node.
you can hook up additional nodes to the SAS code node to plot the topics, save the exported data, build predictive models, and so on. But no more TT nodes should be needed.
Hope this helps
Ray
Since you have already created your topics, you could create one more called TOPIC_SCOTCH. Then connect a SAS code node to your TT node and use the transformation language to create a new topic.
data &em_export_train;
	set &em_import_data;
	if textTopic_1 and not textTopic_2 then Soda = 1; 
run; 
Now run the node and you will be able to filter your data by Soda.
Here the code assumes that TOPIC_SODA is a label for textTopic_1 and TOPIC_SCOTCH is a label for textTopic_2. To see the actual correspondence between labels and variable names, you can select the TT node, choose Exported Data, then Properties. Click on the Variables tab and select Label. This will show you the names of the topic variables that you should use in your Data Step code.
This tip may provide some perspective. (Regular expressions are another possibility.)
Hope this helps.
Ray
Okay. In this case all the documents that has the keyword scotch will be removed. I need to filter out only those documents that has both scotch and soda appearing together in it.
Also I am not quite clear about how the connection between SAS Code node and TT node exists. If SAS code node comes after TT node, then how can I again see the documents falling under each topic just like how I am able to see in TT node? Should I connect one more TT node again after SAS Code node to view the documents related to each topic or what should I do in this case?
Once you have the SAS code mode running successfully, examine the data exported from that node. It will have the document*topics matrix along with any new topics you created in the SAS code node.
you can hook up additional nodes to the SAS code node to plot the topics, save the exported data, build predictive models, and so on. But no more TT nodes should be needed.
Hope this helps
Ray
It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.
Use this tutorial as a handy guide to weigh the pros and cons of these commonly used machine learning algorithms.
Find more tutorials on the SAS Users YouTube channel.
