Hello,
I'm dealing with unstructured text data, and I need to conduct unsupervised multi-class classification of it. I managed to create a term-by-document matrix of my corpus by using PROC TEXTMINE using single-value decomposition (SVD).
My approach to classifying this data is to conduct K-Means clustering and then analyse the clusters to segregate the text into pre-defined topics automatically. However, after viewing the score table of PROC FASTCLUS, I am a bit lost as to how to continue with my evaluation of the results.
For reference, this is how my summary table for PROC FASTCLUS looks like right now. Do let me know if I have to provide more information about my table.
Sorry if its a basic question, and thanks in advance!
When you use PROC FASTCLUS, use the OUT= option to create an output data set. The output data set contains all the original observations and some new variables. Among the new variables, the CLUSTER variable specifies the cluster to which each observation is assigned. In your example, the CLUSTER variable will contain the values 1-4.
So, for example, if you want to analyze the text terms that are in the first cluster, you can use a WHERE statement (or clause) such as
WHERE CLUSTER=1;
The documentation for PROC FASTCLUS contains several examples. I suggest starting with the Getting Started example.
When you use PROC FASTCLUS, use the OUT= option to create an output data set. The output data set contains all the original observations and some new variables. Among the new variables, the CLUSTER variable specifies the cluster to which each observation is assigned. In your example, the CLUSTER variable will contain the values 1-4.
So, for example, if you want to analyze the text terms that are in the first cluster, you can use a WHERE statement (or clause) such as
WHERE CLUSTER=1;
The documentation for PROC FASTCLUS contains several examples. I suggest starting with the Getting Started example.
ok, thank you for the reply!
Join us for SAS Innovate 2025, our biggest and most exciting global event of the year, in Orlando, FL, from May 6-9. Sign up by March 14 for just $795.
Use this tutorial as a handy guide to weigh the pros and cons of these commonly used machine learning algorithms.
Find more tutorials on the SAS Users YouTube channel.