turn on suggestions

Auto-suggest helps you quickly narrow down your search results by suggesting possible matches as you type.

Showing results for

Find a Community

- Home
- /
- Analytics
- /
- Text Analytics
- /
- Why is the range of Term Density cut-off in SAS Co...

Topic Options

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

2 weeks ago - last edited 2 weeks ago

Hi,

When coming across Topic Properties during Topic Modelling in SAS Contextual Analysis (Version 14.2), there is an option to adjust the term density.

Below is a quote from the SAS CA User guide about this.

"Edit topic properties

You can edit the properties that affect all topics. Term density refers to how topics are populated with terms; it is defined by a number between 0.5 and 6 (the default value is 2). When term density is closer to 0.5, topics are more densely populated by terms. When term density is closer to 6, topics are less densely populated by terms. This value affects the number of documents that belong to a topic (for example, having fewer terms in a topic captures fewer documents). Values that you enter are rounded to the nearest integer or half-integer.

"

My question is, how is the term density calculated?

Term density usually refers to number of times the term appears in a document as a proportion of the number of words in a document, and this would result in a value between 0 to 1.

Hence why are the options 0.5 to 6?

Thank you.

Accepted Solutions

Solution

a week ago

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Posted in reply to BeiJia

a week ago

I have posted this question to SAS Tech Support and got the following.

Will post it here for anyone who is interested . Thank you.

The term density actually relates to the number of standard deviations above the mean that the term cutoff is set to for a topic. So generally, with the smallest setting (0.5), you might get 40% of your terms above mean+0.5 standard deviation. Likely you would get well less than 1% if you have a value of 6.

All Replies

Solution

a week ago

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Posted in reply to BeiJia

a week ago

I have posted this question to SAS Tech Support and got the following.

Will post it here for anyone who is interested . Thank you.

The term density actually relates to the number of standard deviations above the mean that the term cutoff is set to for a topic. So generally, with the smallest setting (0.5), you might get 40% of your terms above mean+0.5 standard deviation. Likely you would get well less than 1% if you have a value of 6.