BookmarkSubscribeRSS Feed
mevargasm
Calcite | Level 5

Hi! I'm working with the Model Studio Clustering node to segment a small database of 70 rows and 35 columns. Except for the ID, all columns are interval variables that were previously standardized. My pipeline is extremely simple and looks like this:

                                          mevargasm_0-1648743949472.png

The results from the clustering node shows 5 clusters as the optimal number:

                                          mevargasm_1-1648744012851.png

 

However, when exporting the data (wither from the Otput Data tab of the node, or using a Sve Data or Score Data), all rows display a null value in the _CLUSTER_ID_:

 

                                           mevargasm_2-1648744185438.png

 

What could be causing the issue?

 

 

3 REPLIES 3
sbxkoenk
SAS Super FREQ

I have moved this post to 'Data Mining and Machine Learning' board (where it belongs).

Koen

sbxkoenk
SAS Super FREQ

Hello,

 

I would have to investigate.
What you see is weird and not normal behaviour of course.

 

But before reproducing (or trying to) in Model Studio, ... this question or remark :

The Model Studio VDMML clustering is built for big data. I'm not sure if it will react well on ( only ! ) 70 records with 35 variables.

 

If I would do the same, I would do it with a procedure (or with a task in SAS Studio).
Procedures that you can use are :

  • PROC FASTCLUS (k-means)
  • PROC CLUSTER (hierarchical clustering)
  • PROC KCLUS (k-means and possibility to find out about "best" k with ABC criterion)
  • PROC HPCLUS (High-Performance k-means clustering)
  • PROC MODECLUS finds disjoint clusters of observations with coordinate or distance data by using nonparametric density estimation
  • clustering with the Nonparametric Bayes Action Set (action nonParametricBayes.gmm) in PROC CAS. You can also use the GMM Procedure here!
  • PROC MBC for Model-Based Clustering

Good luck,

Koen

mevargasm
Calcite | Level 5
Thank you very much. I ended up using PROC KCLUS for, as far as I can tell, it replicates Model Studio's Clustering node.

sas-innovate-2026-white.png



April 27 – 30 | Gaylord Texan | Grapevine, Texas

Registration is open

Walk in ready to learn. Walk out ready to deliver. This is the data and AI conference you can't afford to miss.
Register now and lock in 2025 pricing—just $495!

Register now

How to choose a machine learning algorithm

Use this tutorial as a handy guide to weigh the pros and cons of these commonly used machine learning algorithms.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 3 replies
  • 1654 views
  • 1 like
  • 2 in conversation