turn on suggestions

Auto-suggest helps you quickly narrow down your search results by suggesting possible matches as you type.

Showing results for

Find a Community

- Home
- /
- Analytics
- /
- Text Analytics
- /
- How root mean squared standard deviation (RMSSTD) ...

Topic Options

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

11-26-2016 08:06 AM

How root mean squared standard deviation (RMSSTD) is calculated for Text document clustering? There is no mathematics is given in any of SAS documentation or Help regarding this.

Accepted Solutions

Solution

12-12-2016
11:48 PM

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

12-06-2016 10:00 PM

rmstd = sqrt(err/((m-1)*K)), unless m = 1 and then the value is 0.

All Replies

Solution

12-12-2016
11:48 PM

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

12-06-2016 10:00 PM

rmstd = sqrt(err/((m-1)*K)), unless m = 1 and then the value is 0.

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

4 weeks ago

Dear Russ,

It is little confusing to me. I am not able to understand "err is the sum of the m*k squared errors" it will be very helpful if you explain this.

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

3 weeks ago

Each document is a K dimensional vector.

Similarly, the mean of the cluster is a k dimensional vector where each component is an average of the corresponding component for each of the m documents.

A document error is the square root of the sum of the squared differences of each of its k components with each of the k components of the mean of the cluster.

The RMSSTD is a an error for the entire cluster so to incorporate all documents from the cluster in this err caculation, it becomes the sum of the squared differences for every component of every document. There are m*k components to sum over in this case.

Russ