Client segmentation algorithm in banking using SAS EG/EM

KJazem · Posted 05-19-2023 03:02 PM

I want to implement a segmentation methodology for a bank for their business banking clients - SMEs, MSBs, etc. The type of data they have includes: client level data (client industry, current status (active/inactive), what branch they opened their accounts, etc.), product holding information (what products they hold, product activation date/tenure, interest and fee income in the last 2 years, etc.), bank-to-bank transactions, POS billing, and more. The types of products include: POS, payment gateways, credit, debit and prepaid cards, fixed deposit accounts, interest bearing accounts, insurance account, trade finance (letter of credit and letter of guarantee), etc.

The client has both SAS EG and SAS EM. I wanted to know, from anyone's experience here, what the best clustering technique for this use-base would be. I have very little experience with SAS EM, but am I correct in assuming it supports the most common clustering algorithms - k-means, SOMs, hierarchical, etc.? Note that retail customers are completely excluded in this use-case.

Any guidance would be appreciated. Please move this accordingly if it doesn't fit here.

GuyTreepwood · Posted 05-20-2023 04:38 AM

Hello,

For SAS EM, the Cluster node should do k-means and hierarchical clustering, using the Centroid and Ward options, respectively, under the Clustering Method menu. For SOM, there is the SOM/Kohenen node.

You can find the Cluster node documentation here: https://documentation.sas.com/doc/en/emref/14.3/n1vjatb74dundbn12d2ecb09juak.htm

and the SOM/Kohonen here: https://documentation.sas.com/doc/en/emref/14.3/n0978xngiafo2ln1mpj80trq36qk.htm

You can perform hierarchical and k-means clustering in as well EG using PROC CLUSTER, setting the method= option to either Centroid to Ward.

Hope this helps.

sbxkoenk · Posted 05-20-2023 07:51 AM

Hello @KJazem ,

I must unfortunately contradict @GuyTreepwood .
The CLUSTER node in SAS Enterprise Miner does NOT do full-fledged hierarchical clustering on all observations (for big data, that would be an extremely challenging task). Hierarchical clustering in EM CLUSTER node is only an intermediate step to estimate the "best" number of clusters.

The Cluster node in Enterprise Miner (latest version is 15.2) is doing K-MEANS clustering!!

Hierarchical clustering is just an intermediate step to determine the best number of clusters.

This is how the CLUSTER node (in the Explore Group) works ... when you do not change the defaults :

k-means is done with k=50 (preliminary maximum)
Then the 50 multivariate mean vectors are clustered with WARD (agglomerative) hierarchical clustering method
Then the best number of clusters is determined (minimum=2 , final maximum=20). Let's say best = 8 !
Then a k-means is done again on the full dataset with k=8.

You can also use the "HP Cluster" node in the HPDM group of nodes (HPDM = High-Performance Data Mining).

The "HP Cluster" node is running PROC HPCLUS in the background. The HPCLUS procedure is a high-performance procedure that performs k-means clustering.
And that "HP Cluster" node (PROC HPCLUS) is finding the number of clusters (the k) using the aligned box criterion (ABC) method (and NOT via that foray into hierarchical clustering).

In VIYA PROC HPCLUS evolved into PROC KCLUS.

Via the "Open Source Integration Node" in SAS EM, you can also apply "Spectral Clustering" to your data!

Via the "SAS Code Node" in SAS EM, you can also apply PROC MODECLUS to your data!

MODECLUS: finds disjoint clusters of observations with coordinate or distance data by using nonparametric density estimation. It can also perform approximate nonparametric significance tests for the number of clusters.

Good luck,

Koen

sbxkoenk · Posted 05-20-2023 08:01 AM

On top of previous reply, I add this note :

The best way, in my opinion, to assess the quality of your clustering solution is the Silhouette Coefficient.

(you do ultimately want heterogeneity between clusters and homogeneity within clusters)

Here are 3 useful articles / blogs :

Paper 3409-2019
How to Evaluate Different Clustering Results?
Ralph Abbey, SAS Institute Inc.
https://support.sas.com/resources/papers/proceedings19/3409-2019.pdf
What is the silhouette statistic in cluster analysis?
By Rick Wicklin on The DO Loop May 15, 2023
https://blogs.sas.com/content/iml/2023/05/15/silhouette-statistic-cluster.html
Compute the silhouette statistic in SAS
By Rick Wicklin on The DO Loop May 17, 2023
https://blogs.sas.com/content/iml/2023/05/17/compute-silhouette-sas.html

If you do not have SAS/IML (PROC IML) in your license, then you should calculate Silhouette coefficient with a macro that uses PROC DISTANCE and PROC MEANS and some data steps.

Good luck,

Koen

KJazem · Posted 05-20-2023 02:18 PM

These are very helpful, thank you for the references. A couple of follow-up questions: 1) Would you say K-means clustering works best with customer segmentation? We have many features so just want to see which works best - K-means, SOM, etc. and 2) Is the Silhouette coefficient the best metric to evaluate any clustering algorithm or specifically K-means?

Thanks for the help!

sbxkoenk · Posted 05-20-2023 09:32 PM

Hello,

@KJazem wrote:
1) Would you say K-means clustering works best with customer segmentation? We have many features so just want to see which works best - K-means, SOM, etc. and
2) Is the Silhouette coefficient the best metric to evaluate any clustering algorithm or specifically K-means?

1) Hierarchical clustering (like done with PROC CLUSTER) is superior to k-means disjoint clustering in general, but with tens of thousands of customers and many features, it can take many hours for calculations to finish.
Also, you might need to transform the data before clustering (same for k-means by the way).

For example, you can use the ACECLUS procedure to obtain approximate estimates of the pooled within-cluster covariance matrix and to compute canonical variables for subsequent analysis. You use PROC ACECLUS to preprocess data before you cluster it by using the CLUSTER procedure.
PROC CLUSTER has many Clustering Methods (ultrametric and others) you can try out.

2) Silhouette coefficient is the best metric to evaluate any clustering solution no matter which algorithm was used to establish the clustering solution.

Koen

sbxkoenk · Posted 05-22-2023 12:17 PM

Hello @KJazem ,

For inspiration, you can also look here :

https://www.lexjansen.com/search/searchresults.php?q=%22customer%20segmentation%22

[[

SAS Tip: Learn lexjansen.com

https://communities.sas.com/t5/SAS-Tips-from-the-Community/SAS-Tip-Learn-lexjansen-com/td-p/436336

]]

Koen

Client segmentation algorithm in banking using SAS EG/EM

Re: Client segmentation algorithm in banking using SAS EG/EM

Re: Client segmentation algorithm in banking using SAS EG/EM

Re: Client segmentation algorithm in banking using SAS EG/EM

Re: Client segmentation algorithm in banking using SAS EG/EM

Re: Client segmentation algorithm in banking using SAS EG/EM

Re: Client segmentation algorithm in banking using SAS EG/EM

Registration is open