BookmarkSubscribeRSS Feed
aha123
Obsidian | Level 7

Text Cluster node in Text Miner peforms SVD before clustering. Can anyone tell me the advantage of SVD here?

3 REPLIES 3
JohnJPS
Quartz | Level 8

I think this oft-cited paper (http://www.cc.gatech.edu/~vempala/papers/dfkvv.pdf) describes it as well as it can be explained. Basically, they talk about how clustering the SVD solves an approximate clustering solution for the actual dataset, with much better performance.  So it's probably that performance boost that is the primary explanation.

aha123
Obsidian | Level 7
This paper is excellent. Do you have any paper about how much information is lost after doing SVD? PCA can tell you how much variance is kept in first x numbers of PCs. Wonder if SVD has such measurement. Also why not use PCA for text clustering purpose? I read online many articles and still can't get any clear answer.
JohnJPS
Quartz | Level 8

I dug a little deeper and this discussion really does a great job: starting with PCA and moving onto SVD: https://www.cs.princeton.edu/picasso/mats/PCA-Tutorial-Intuition_jp.pdf

 

A more brief Q&A that is quite nice is here: https://www.quora.com/What-is-an-intuitive-explanation-of-the-relation-between-PCA-and-SVD

 

Hope that helps!

 

 

sas-innovate-2024.png

Don't miss out on SAS Innovate - Register now for the FREE Livestream!

Can't make it to Vegas? No problem! Watch our general sessions LIVE or on-demand starting April 17th. Hear from SAS execs, best-selling author Adam Grant, Hot Ones host Sean Evans, top tech journalist Kara Swisher, AI expert Cassie Kozyrkov, and the mind-blowing dance crew iLuminate! Plus, get access to over 20 breakout sessions.

 

Register now!

How to choose a machine learning algorithm

Use this tutorial as a handy guide to weigh the pros and cons of these commonly used machine learning algorithms.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 3 replies
  • 1059 views
  • 2 likes
  • 2 in conversation