BookmarkSubscribeRSS Feed
mevargasm
Calcite | Level 5

Hi! I'm working with the Model Studio Clustering node to segment a small database of 70 rows and 35 columns. Except for the ID, all columns are interval variables that were previously standardized. My pipeline is extremely simple and looks like this:

                                          mevargasm_0-1648743949472.png

The results from the clustering node shows 5 clusters as the optimal number:

                                          mevargasm_1-1648744012851.png

 

However, when exporting the data (wither from the Otput Data tab of the node, or using a Sve Data or Score Data), all rows display a null value in the _CLUSTER_ID_:

 

                                           mevargasm_2-1648744185438.png

 

What could be causing the issue?

 

 

3 REPLIES 3
sbxkoenk
SAS Super FREQ

I have moved this post to 'Data Mining and Machine Learning' board (where it belongs).

Koen

sbxkoenk
SAS Super FREQ

Hello,

 

I would have to investigate.
What you see is weird and not normal behaviour of course.

 

But before reproducing (or trying to) in Model Studio, ... this question or remark :

The Model Studio VDMML clustering is built for big data. I'm not sure if it will react well on ( only ! ) 70 records with 35 variables.

 

If I would do the same, I would do it with a procedure (or with a task in SAS Studio).
Procedures that you can use are :

  • PROC FASTCLUS (k-means)
  • PROC CLUSTER (hierarchical clustering)
  • PROC KCLUS (k-means and possibility to find out about "best" k with ABC criterion)
  • PROC HPCLUS (High-Performance k-means clustering)
  • PROC MODECLUS finds disjoint clusters of observations with coordinate or distance data by using nonparametric density estimation
  • clustering with the Nonparametric Bayes Action Set (action nonParametricBayes.gmm) in PROC CAS. You can also use the GMM Procedure here!
  • PROC MBC for Model-Based Clustering

Good luck,

Koen

mevargasm
Calcite | Level 5
Thank you very much. I ended up using PROC KCLUS for, as far as I can tell, it replicates Model Studio's Clustering node.

hackathon24-white-horiz.png

2025 SAS Hackathon: There is still time!

Good news: We've extended SAS Hackathon registration until Sept. 12, so you still have time to be part of our biggest event yet – our five-year anniversary!

Register Now

How to choose a machine learning algorithm

Use this tutorial as a handy guide to weigh the pros and cons of these commonly used machine learning algorithms.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 3 replies
  • 1510 views
  • 1 like
  • 2 in conversation