BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.

How root mean squared standard deviation (RMSSTD) is calculated for Text document clustering? There is no mathematics is given in any of SAS documentation or Help regarding this.

1 ACCEPTED SOLUTION

Accepted Solutions
RussAlbright
SAS Employee

if K is the number of dimensions used in the clustering, m is the number of docs in the cluster, and err  is  the  sum of the m*k  squared errors, then it looks like it is calculated to be

rmstd = sqrt(err/((m-1)*K)), unless m = 1 and then the value is 0.


Register today and join us virtually on June 16!
sasglobalforum.com | #SASGF

View now: on-demand content for SAS users

View solution in original post

3 REPLIES 3
RussAlbright
SAS Employee

if K is the number of dimensions used in the clustering, m is the number of docs in the cluster, and err  is  the  sum of the m*k  squared errors, then it looks like it is calculated to be

rmstd = sqrt(err/((m-1)*K)), unless m = 1 and then the value is 0.


Register today and join us virtually on June 16!
sasglobalforum.com | #SASGF

View now: on-demand content for SAS users

AbhishekVerma1985
Fluorite | Level 6

Dear Russ,

It is little confusing to me. I am not able to understand "err  is  the  sum of the m*k  squared errors" it will be very helpful if you explain this.

RussAlbright
SAS Employee

Each document is a K dimensional vector.

Similarly, the mean of the cluster is a k dimensional vector where each component is an average of the corresponding component for each of the m documents.

A document error is the square root of the sum of the squared differences of each of its k components with each of the  k components of the  mean of the cluster.

The RMSSTD is a an error for the entire cluster so to incorporate all documents from the cluster in this err caculation, it becomes the sum of the squared differences for every component of every document. There are m*k components to sum over in this case.

 

Russ


Register today and join us virtually on June 16!
sasglobalforum.com | #SASGF

View now: on-demand content for SAS users

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

How to choose a machine learning algorithm

Use this tutorial as a handy guide to weigh the pros and cons of these commonly used machine learning algorithms.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 3 replies
  • 2622 views
  • 4 likes
  • 2 in conversation