I have been using the SAS viya text analytics-text text parsing and topic discovery feature as seen below: I have set it to find use terms in documents (rather than the whole corpus). As an output it produces two tables. One which indicates what each term corresponds to, and another which indicates the frequency of each term in each document. As an example see below: I notice that in the document detailing the words each term corresponds to (to the right), some of the words have parent_id which appears to indicate the key (term number) of another term that the original term is derived from/is similar to. I know that you can select to save the document term frequency matrix (picture to the left) with either child terms or not. In that case, if you save it without child terms are the frequencies displayed in that matrix the sum of the frequencies of the original terms plus that of the child terms? And if you save the document term frequency matrix with child terms, i see that the child terms are listed separately with their own frequency for each term in the document term frequency matrix. In that case, are the frequencies of each term, only the frequencies of that exact term without the frequencies of their respective child terms added on top?
... View more