02-20-2017 09:20 AM
I think this oft-cited paper (http://www.cc.gatech.edu/~vempala/papers/dfkvv.pdf) describes it as well as it can be explained. Basically, they talk about how clustering the SVD solves an approximate clustering solution for the actual dataset, with much better performance. So it's probably that performance boost that is the primary explanation.
02-20-2017 04:30 PM
02-20-2017 04:43 PM - edited 02-20-2017 04:44 PM
I dug a little deeper and this discussion really does a great job: starting with PCA and moving onto SVD: https://www.cs.princeton.edu/picasso/mats/PCA-Tutorial-Intuition_jp.pdf
A more brief Q&A that is quite nice is here: https://www.quora.com/What-is-an-intuitive-explanation-of-the-relation-between-PCA-and-SVD
Hope that helps!