BookmarkSubscribeRSS Feed
William29
Obsidian | Level 7

I have been using the SAS viya text analytics-text text parsing and topic discovery feature as seen below:

 

William29_0-1695697850631.png

 

 

I have set it to find use terms in documents (rather than the whole corpus). As an output it produces two tables. One which indicates what each term corresponds to, and another which indicates the frequency of each term in each document. As an example see below:

 

William29_3-1695693457973.png

 

 

I notice that in the document detailing the words each term corresponds to (to the right), some of the words have parent_id which appears to indicate the key (term number) of another term that the original term is derived from/is similar to.

 

 

I know that you can select to save the document term frequency matrix (picture to the left) with either child terms or not. In that case, if you save it without child terms are the frequencies displayed in that matrix the sum of the frequencies of the original terms plus that of the child terms?

 

And if you save the document term frequency matrix with child terms, i see that the child terms are listed separately with their own frequency for each term in the document term frequency matrix. In that case, are the frequencies of each term, only the frequencies of that exact term without the frequencies of their respective child terms added on top?

 

 

 

1 REPLY 1
PeterChristie
SAS Employee

Hello @William29 . Let me see if I understand your question and can provide a helpful response.

  

If you click on the Code tab of the generated SAS Studio task, notice that the Textmine procedure is used to parse the documents. 

 

I ran the Text Parsing and Topic Discovery task with more output tables as a test. Here is the generated code from my little test fyi.

 

proc textmine data=_tmpcas_._preProcessedData_;
	var Text;
	doc_id __uniqueid__;
	parse stop=_tmpcas_._stoplist_ outparent=CASUSER.table1 
		outterms=CASUSER.table3 outchild=CASUSER.table2;
	svd k=25 numlabels=5 outtopics=CASUSER.tab4Topics svds=_tmpcas_._svds_;
run;

 

 

The SAS Viya Platform Programming Documentation contains useful information that may help if any additional questions come up.  SAS Help Center: PARSE Statement

 

The OUTCHILD data table saves only the kept, representative terms. The child frequencies are not attributed to their corresponding parent (as they are in the OUTPARENT= data table).

 

The OUTPARENT data table contains only the kept, representative terms, and the child frequencies are attributed to the corresponding parent.

 

Hope this helps!

SAS Innovate 2025: Register Now

Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!

How to choose a machine learning algorithm

Use this tutorial as a handy guide to weigh the pros and cons of these commonly used machine learning algorithms.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 1 reply
  • 1184 views
  • 0 likes
  • 2 in conversation