Text mining and content categorization

Why is the range of Term Density cut-off in SAS Contextual Analysis from 0.5 to 6?

Accepted Solution Solved
Reply
New Contributor
Posts: 2
Accepted Solution

Why is the range of Term Density cut-off in SAS Contextual Analysis from 0.5 to 6?

[ Edited ]

Hi,

 

When coming across Topic Properties during Topic Modelling in SAS Contextual Analysis (Version 14.2), there is an option to adjust the term density.

Topic Properties.png

Below is a quote from the SAS CA User guide about this. 

 

"Edit topic properties

You can edit the properties that affect all topics. Term density refers to how topics are populated with terms; it is defined by a number between 0.5 and 6 (the default value is 2). When term density is closer to 0.5, topics are more densely populated by terms. When term density is closer to 6, topics are less densely populated by terms. This value affects the number of documents that belong to a topic (for example, having fewer terms in a topic captures fewer documents). Values that you enter are rounded to the nearest integer or half-integer.

"

My question is, how is the term density calculated?

Term density usually refers to number of times the term appears in a document as a proportion of the number of words in a document, and this would result in a value between 0 to 1.

Hence why are the options 0.5 to 6? 

 

Thank you. 

 


Accepted Solutions
Solution
‎03-14-2018 09:58 PM
New Contributor
Posts: 2

Re: Why is the range of Term Density cut-off in SAS Contextual Analysis from 0.5 to 6?

I have posted this question to SAS Tech Support and got the following. 

Will post it here for anyone who is interested . Thank you. 

 

The term  density actually relates to the number of standard deviations above the mean that the term cutoff is set to for a topic.  So generally, with the smallest setting (0.5), you might get 40% of your terms above mean+0.5 standard deviation.   Likely you would get well less than 1% if you have a value of 6.

 

View solution in original post


All Replies
Solution
‎03-14-2018 09:58 PM
New Contributor
Posts: 2

Re: Why is the range of Term Density cut-off in SAS Contextual Analysis from 0.5 to 6?

I have posted this question to SAS Tech Support and got the following. 

Will post it here for anyone who is interested . Thank you. 

 

The term  density actually relates to the number of standard deviations above the mean that the term cutoff is set to for a topic.  So generally, with the smallest setting (0.5), you might get 40% of your terms above mean+0.5 standard deviation.   Likely you would get well less than 1% if you have a value of 6.

 

☑ This topic is solved.

Need further help from the community? Please ask a new question.

Discussion stats
  • 1 reply
  • 262 views
  • 0 likes
  • 1 in conversation