BookmarkSubscribeRSS Feed
☑ This topic is solved. Need further help from the community? Please sign in and ask a new question.
thistleandtweed
Fluorite | Level 6

Hello, 

 

I'm dealing with unstructured text data, and I need to conduct unsupervised multi-class classification of it. I managed to create a term-by-document matrix of my corpus by using PROC TEXTMINE using single-value decomposition (SVD). 

 

My approach to classifying this data is to conduct K-Means clustering and then analyse the clusters to segregate the text into pre-defined topics automatically. However, after viewing the score table of PROC FASTCLUS, I am a bit lost as to how to continue with my evaluation of the results. 

 

For reference, this is how my summary table for PROC FASTCLUS looks like right now. Do let me know if I have to provide more information about my table.

 

thistleandtweed_0-1717559910435.png

 

Sorry if its a basic question, and thanks in advance!

 

 

 

1 ACCEPTED SOLUTION

Accepted Solutions
Rick_SAS
SAS Super FREQ

When you use PROC FASTCLUS, use the OUT= option to create an output data set. The output data set contains all the original observations and some new variables. Among the new variables, the CLUSTER variable specifies the cluster to which each observation is assigned. In your example, the CLUSTER variable will contain the values 1-4. 

 

So, for example, if you want to analyze the text terms that are in the first cluster, you can use a WHERE statement (or clause) such as
WHERE CLUSTER=1;

 

The documentation for PROC FASTCLUS contains several examples. I suggest starting with the Getting Started example.

View solution in original post

2 REPLIES 2
Rick_SAS
SAS Super FREQ

When you use PROC FASTCLUS, use the OUT= option to create an output data set. The output data set contains all the original observations and some new variables. Among the new variables, the CLUSTER variable specifies the cluster to which each observation is assigned. In your example, the CLUSTER variable will contain the values 1-4. 

 

So, for example, if you want to analyze the text terms that are in the first cluster, you can use a WHERE statement (or clause) such as
WHERE CLUSTER=1;

 

The documentation for PROC FASTCLUS contains several examples. I suggest starting with the Getting Started example.

thistleandtweed
Fluorite | Level 6

ok, thank you for the reply!

sas-innovate-2024.png

Available on demand!

Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.

 

Register now!

How to choose a machine learning algorithm

Use this tutorial as a handy guide to weigh the pros and cons of these commonly used machine learning algorithms.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 2 replies
  • 250 views
  • 1 like
  • 2 in conversation