topic Weighting the variables to be used in Clustering in SAS Data Science

Weighting the variables to be used in Clustering

husseinmazaar — Fri, 16 Mar 2018 18:55:11 GMT

Dears,

Kindly note that I need to segment the customers based on behavioral monthly sales and I have 7 variables for clustering and segmenting the customers but some variables are higher than others on importance and weights.This means that if I have Variable called Total_sales_per_month and this variable has 40% weights (to contribute with 40% in clustering and decisions).

I need to know how to apply this in SAS Enterprise miner or sas procedures, does this method correctly or there another way in machine learning with the same idea .

Please support me to resolve and satisfy the business needs.

Re: Weighting the variables to be used in Clustering

MikeStockstill — Mon, 26 Mar 2018 19:11:34 GMT

Hello husseinmazaar-

When computing distances, variables that have a larger variance have a greater contribution to the distance computation. For that reason, variables are often standardized so that all variables have equal importance.

In your case, you might choose to standardize all variables except total_sales_per_month so that they have equal variance, and then assign a different and larger variance to total_sales_per_month.

Run PROC STDIZE (a SAS/STAT procedure) twice for the scenario that you describe. Use one run to standardize all variables to have the same variance (standard deviation, scale). Use the second run to specify a MULT= value for just the total_sales_per_month variable so that it has a larger scale.

https://support.sas.com/documentation/onlinedoc/stat/

When you run your cluster analysis, be sure to turn OFF any automatic standardization so that your custom standardization is used.

Have a great week!

Re: Weighting the variables to be used in Clustering

husseinmazaar — Thu, 29 Mar 2018 08:21:28 GMT

Thanks so much.