BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
BrianLoe
Fluorite | Level 6

I have been using SAS Enterprise Miner to analyze responses to open-ended questions on customer satisfaction surveys. I have been using a Text Cluster node to group these free text responses. The cluster descriptions give a set of individual keywords (single words, word stems, and noun groups) that best fit the cluster, but they don't necessarily capture the sentiment of the respondents.

 

I am looking for a good way to find representative responses for each cluster.

 

The Text Cluster node computes cluster probabilities that express the probability (or likelihood) that each response fits into each of the clusters (about 2 dozen). The text is assigned to the cluster corresponding to the highest likelihood. One might think that the likelihood would be a good measure of whether a response is a good representative, but short text responses with one or two keywords tend to get high values 1.00 (or nearly so) whereas longer, more thoughtful responses (negative and positive) tend to be longer and have lower probability scores.

 

So for each cluster I have computed the quartile scores for the customer's level of satisfaction and for the length of their response in the number of words. I have eliminated the responses in the top and bottom 25% for length, and randommly selected one response from each quartile of satisfaction by cluster.

 

I am just wondering if there is some way to influence the random selection to get something more analytically representative.

 

1 ACCEPTED SOLUTION

Accepted Solutions
BrianLoe
Fluorite | Level 6

Damien,

 

We did switch to topics and added some user topics, as opposed to clusters.  The text topic "raw" variables, plus counting up the number of topics assigned to each document, provide a way of picking out documents that are highly representative of a given topic. 

 

Clustering in addition to text topics gave some interesting results, but it did not help to choose cluster or topic representatives.

--

Brian Loe

View solution in original post

3 REPLIES 3
Damien_Mather
Lapis Lazuli | Level 10
try the topic node first then cluster on topics not terms. If you don't like the unsupervised topics add some user topics yourself until you feel you are capturing the sentiments well, then cluster on those topics. Works for me and my students..
BrianLoe
Fluorite | Level 6

Damien,

 

We did switch to topics and added some user topics, as opposed to clusters.  The text topic "raw" variables, plus counting up the number of topics assigned to each document, provide a way of picking out documents that are highly representative of a given topic. 

 

Clustering in addition to text topics gave some interesting results, but it did not help to choose cluster or topic representatives.

--

Brian Loe

Damien_Mather
Lapis Lazuli | Level 10
Another thought: if the documents are rather terse and not typically full sentences then sometimes using single term topics and clustering on those can work better

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

How to choose a machine learning algorithm

Use this tutorial as a handy guide to weigh the pros and cons of these commonly used machine learning algorithms.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 3 replies
  • 1253 views
  • 2 likes
  • 2 in conversation