Trouble with Clustering Node in Model Studio

mevargasm · Posted 03-31-2022 12:30 PM

Hi! I'm working with the Model Studio Clustering node to segment a small database of 70 rows and 35 columns. Except for the ID, all columns are interval variables that were previously standardized. My pipeline is extremely simple and looks like this:

The results from the clustering node shows 5 clusters as the optimal number:

However, when exporting the data (wither from the Otput Data tab of the node, or using a Sve Data or Score Data), all rows display a null value in the _CLUSTER_ID_:

What could be causing the issue?

sbxkoenk · Posted 03-31-2022 02:42 PM

I have moved this post to 'Data Mining and Machine Learning' board (where it belongs).

Koen

sbxkoenk · Posted 03-31-2022 02:54 PM

Hello,

I would have to investigate.
What you see is weird and not normal behaviour of course.

But before reproducing (or trying to) in Model Studio, ... this question or remark :

The Model Studio VDMML clustering is built for big data. I'm not sure if it will react well on ( only ! ) 70 records with 35 variables.

If I would do the same, I would do it with a procedure (or with a task in SAS Studio).
Procedures that you can use are :

PROC FASTCLUS (k-means)
PROC CLUSTER (hierarchical clustering)
PROC KCLUS (k-means and possibility to find out about "best" k with ABC criterion)
PROC HPCLUS (High-Performance k-means clustering)
PROC MODECLUS finds disjoint clusters of observations with coordinate or distance data by using nonparametric density estimation
clustering with the Nonparametric Bayes Action Set (action nonParametricBayes.gmm) in PROC CAS. You can also use the GMM Procedure here!
PROC MBC for Model-Based Clustering

Good luck,

Koen

mevargasm · Posted 04-04-2022 09:08 AM

Thank you very much. I ended up using PROC KCLUS for, as far as I can tell, it replicates Model Studio's Clustering node.

Trouble with Clustering Node in Model Studio

Re: Trouble with Clustering Node in Model Studio

Re: Trouble with Clustering Node in Model Studio

Re: Trouble with Clustering Node in Model Studio

Trouble with Clustering Node in Model Studio

Re: Trouble with Clustering Node in Model Studio

Re: Trouble with Clustering Node in Model Studio

Re: Trouble with Clustering Node in Model Studio

The 2025 SAS Hackathon has begun!