When SAS Contextual Analysis was first released, there was not any way to read in a SAS Enterprise Content Categorization project. The ability to import these projects was an item on the roadmap that would be implemented in a future release. Starting in SAS Contextual Analysis 13.2, the importing of a SAS Enterprise Content Categorization project function was introduced.
On step 1 of the Create New Project wizard, there is an option to Import a SAS Enterprise Content Categorization project as seen below:
When selecting the SAS Enterprise Content Categorization project, make sure to choose the project file with an extension of .tk2. The project folder needs to be accessible by the Workspace Server that is being used for SAS Contextual Analysis.
Here are some important notes from the documentation regarding the importing of a project:
Using created projects from SAS Enterprise Content Categorization can be very beneficial in moving the well-established rules in your text projects to newer technology. The combination of some of traditional SAS Text Miner features such as using different input sources, using synonym and stop lists, and topic generation allow for a more integrated way to analyze the document corpus. One of the biggest advantages of importing the projects is that the user can get score code to be generated in data step 2 (DS2) format instead of traditional data step code.
In this example project, we have extracted 100 web pages that discuss disasters. The imported SAS Enterprise Content Categorization project is a fairly simple project, only with one category, which deals with earthquakes. Here is what the rule for earthquakes looks like with the project:
Now, in SAS Contextual Analysis, the above rule has been imported into the project. After running the project, the output is split up into four separate windows. The first one is Concepts. If pre-defined concepts are used, or if user written concepts have been added, any matches for the rules will be shown here:
Next, the project will show the terms found in the documents. In this output window, terms are seen that are treated as synonyms (for example the terms run, running, ran, jog, and sprint can all be combined for the parent term of run). A parent term is noted by the folder icon that precedes it on its left. In addition, terms that are not going to be important (or differentiators) for the analysis can be dropped by right clicking on the term in the Kept Terms tab and choosing Drop Term as seen here:
Next, the project will show the output for the created topics. Upon looking at the topics generated, we can look at the most descriptive terms to understand what topics were found. In looking at the topics:
Without really diving deep into the topics individually, we have a pretty good idea what each topic is about by looking at the descriptive terms (the terms are listed in significance from left to right. The earlier the term is listed, the more important the term is to the topic):
The final section of output contains the results from the category rules that come from SAS Enterprise Content Categorization. Notice that the earthquake rule is found in 44 documents:
If I look in the Edit Rules section, I can see the rule, which matches what was imported:
If I look in the Documents tab, I can see the documents where matches occurred for the rule:
While entering category rules within SAS Contextual Analysis can be done, a previously created SAS Enterprise Content Categorization project can be imported to include the easy and complex rules already built. This can be leveraged to get more information out of your data quicker, and then apply the findings to future documents.
Data Literacy is for all, even absolute beginners. Jump on board with this free e-learning and boost your career prospects.