Text mining and content categorization

Cluster representatives

Accepted Solution Solved
Reply
Occasional Contributor
Posts: 8
Accepted Solution

Cluster representatives

I have been using SAS Enterprise Miner to analyze responses to open-ended questions on customer satisfaction surveys. I have been using a Text Cluster node to group these free text responses. The cluster descriptions give a set of individual keywords (single words, word stems, and noun groups) that best fit the cluster, but they don't necessarily capture the sentiment of the respondents.

 

I am looking for a good way to find representative responses for each cluster.

 

The Text Cluster node computes cluster probabilities that express the probability (or likelihood) that each response fits into each of the clusters (about 2 dozen). The text is assigned to the cluster corresponding to the highest likelihood. One might think that the likelihood would be a good measure of whether a response is a good representative, but short text responses with one or two keywords tend to get high values 1.00 (or nearly so) whereas longer, more thoughtful responses (negative and positive) tend to be longer and have lower probability scores.

 

So for each cluster I have computed the quartile scores for the customer's level of satisfaction and for the length of their response in the number of words. I have eliminated the responses in the top and bottom 25% for length, and randommly selected one response from each quartile of satisfaction by cluster.

 

I am just wondering if there is some way to influence the random selection to get something more analytically representative.

 


Accepted Solutions
Solution
‎04-08-2016 02:23 PM
Occasional Contributor
Posts: 8

Re: Cluster representatives

Damien,

 

We did switch to topics and added some user topics, as opposed to clusters.  The text topic "raw" variables, plus counting up the number of topics assigned to each document, provide a way of picking out documents that are highly representative of a given topic. 

 

Clustering in addition to text topics gave some interesting results, but it did not help to choose cluster or topic representatives.

--

Brian Loe

View solution in original post


All Replies
Frequent Contributor
Posts: 135

Re: Cluster representatives

try the topic node first then cluster on topics not terms. If you don't like the unsupervised topics add some user topics yourself until you feel you are capturing the sentiments well, then cluster on those topics. Works for me and my students..
Solution
‎04-08-2016 02:23 PM
Occasional Contributor
Posts: 8

Re: Cluster representatives

Damien,

 

We did switch to topics and added some user topics, as opposed to clusters.  The text topic "raw" variables, plus counting up the number of topics assigned to each document, provide a way of picking out documents that are highly representative of a given topic. 

 

Clustering in addition to text topics gave some interesting results, but it did not help to choose cluster or topic representatives.

--

Brian Loe

Frequent Contributor
Posts: 135

Re: Cluster representatives

Another thought: if the documents are rather terse and not typically full sentences then sometimes using single term topics and clustering on those can work better
☑ This topic is solved.

Need further help from the community? Please ask a new question.

Discussion stats
  • 3 replies
  • 502 views
  • 2 likes
  • 2 in conversation