09-21-2014 10:56 AM
I'm doing a research project where I want to analyse a language corpus using similarity metrics. The corpus that I'm using is divided into speech domains and I want to (1) analyse the relationship of the already classified documents to those pre-determined domains, and (2) measure the statistical differences between the domains. I'm not sure, however, whether SAS Text Miner can do that. I believe the cluster node can give me the Mahalanobis distance between documents and clusters, but, as far as I'm aware, it doesn't allow for supervised classification. Hierarchical clustering would give a distance measurement between clusters, but again doesn't allow for supervised classification. Content categorisation allows for supervised classification, but I'm not interested in developing rules for automatic classification, and (again as far as I'm aware) content categorisation doesn't give similarity metrics.
Could anyone tell me whether Text Miner is able to do the statistical analyses that I'm after? Many thanks in advance for your time and help!