BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.

How root mean squared standard deviation (RMSSTD) is calculated for Text document clustering? There is no mathematics is given in any of SAS documentation or Help regarding this.

1 ACCEPTED SOLUTION

Accepted Solutions
RussAlbright
SAS Employee

if K is the number of dimensions used in the clustering, m is the number of docs in the cluster, and err  is  the  sum of the m*k  squared errors, then it looks like it is calculated to be

rmstd = sqrt(err/((m-1)*K)), unless m = 1 and then the value is 0.


Register today and join us virtually on June 16!
sasglobalforum.com | #SASGF

View now: on-demand content for SAS users

View solution in original post

3 REPLIES 3
RussAlbright
SAS Employee

if K is the number of dimensions used in the clustering, m is the number of docs in the cluster, and err  is  the  sum of the m*k  squared errors, then it looks like it is calculated to be

rmstd = sqrt(err/((m-1)*K)), unless m = 1 and then the value is 0.


Register today and join us virtually on June 16!
sasglobalforum.com | #SASGF

View now: on-demand content for SAS users

AbhishekVerma1985
Fluorite | Level 6

Dear Russ,

It is little confusing to me. I am not able to understand "err  is  the  sum of the m*k  squared errors" it will be very helpful if you explain this.

RussAlbright
SAS Employee

Each document is a K dimensional vector.

Similarly, the mean of the cluster is a k dimensional vector where each component is an average of the corresponding component for each of the m documents.

A document error is the square root of the sum of the squared differences of each of its k components with each of the  k components of the  mean of the cluster.

The RMSSTD is a an error for the entire cluster so to incorporate all documents from the cluster in this err caculation, it becomes the sum of the squared differences for every component of every document. There are m*k components to sum over in this case.

 

Russ


Register today and join us virtually on June 16!
sasglobalforum.com | #SASGF

View now: on-demand content for SAS users

sas-innovate-2024.png

Don't miss out on SAS Innovate - Register now for the FREE Livestream!

Can't make it to Vegas? No problem! Watch our general sessions LIVE or on-demand starting April 17th. Hear from SAS execs, best-selling author Adam Grant, Hot Ones host Sean Evans, top tech journalist Kara Swisher, AI expert Cassie Kozyrkov, and the mind-blowing dance crew iLuminate! Plus, get access to over 20 breakout sessions.

 

Register now!

How to choose a machine learning algorithm

Use this tutorial as a handy guide to weigh the pros and cons of these commonly used machine learning algorithms.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 3 replies
  • 2674 views
  • 4 likes
  • 2 in conversation