How root mean squared standard deviation (RMSSTD) is calculated for Text document clustering? There is no mathematics is given in any of SAS documentation or Help regarding this.
if K is the number of dimensions used in the clustering, m is the number of docs in the cluster, and err is the sum of the m*k squared errors, then it looks like it is calculated to be
rmstd = sqrt(err/((m-1)*K)), unless m = 1 and then the value is 0.
Register today and join us virtually on June 16!
sasglobalforum.com | #SASGF
View now: on-demand content for SAS users
if K is the number of dimensions used in the clustering, m is the number of docs in the cluster, and err is the sum of the m*k squared errors, then it looks like it is calculated to be
rmstd = sqrt(err/((m-1)*K)), unless m = 1 and then the value is 0.
Register today and join us virtually on June 16!
sasglobalforum.com | #SASGF
View now: on-demand content for SAS users
Dear Russ,
It is little confusing to me. I am not able to understand "err is the sum of the m*k squared errors" it will be very helpful if you explain this.
Each document is a K dimensional vector.
Similarly, the mean of the cluster is a k dimensional vector where each component is an average of the corresponding component for each of the m documents.
A document error is the square root of the sum of the squared differences of each of its k components with each of the k components of the mean of the cluster.
The RMSSTD is a an error for the entire cluster so to incorporate all documents from the cluster in this err caculation, it becomes the sum of the squared differences for every component of every document. There are m*k components to sum over in this case.
Russ
Register today and join us virtually on June 16!
sasglobalforum.com | #SASGF
View now: on-demand content for SAS users
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Use this tutorial as a handy guide to weigh the pros and cons of these commonly used machine learning algorithms.
Find more tutorials on the SAS Users YouTube channel.