Dears,
Kindly note that I need to segment the customers based on behavioral monthly sales and I have 7 variables for clustering and segmenting the customers but some variables are higher than others on importance and weights.This means that if I have Variable called Total_sales_per_month and this variable has 40% weights (to contribute with 40% in clustering and decisions).
I need to know how to apply this in SAS Enterprise miner or sas procedures, does this method correctly or there another way in machine learning with the same idea .
Please support me to resolve and satisfy the business needs.
Hello husseinmazaar-
When computing distances, variables that have a larger variance have a greater contribution to the distance computation. For that reason, variables are often standardized so that all variables have equal importance.
In your case, you might choose to standardize all variables except total_sales_per_month so that they have equal variance, and then assign a different and larger variance to total_sales_per_month.
Run PROC STDIZE (a SAS/STAT procedure) twice for the scenario that you describe. Use one run to standardize all variables to have the same variance (standard deviation, scale). Use the second run to specify a MULT= value for just the total_sales_per_month variable so that it has a larger scale.
https://support.sas.com/documentation/onlinedoc/stat/
When you run your cluster analysis, be sure to turn OFF any automatic standardization so that your custom standardization is used.
Have a great week!
Hello husseinmazaar-
When computing distances, variables that have a larger variance have a greater contribution to the distance computation. For that reason, variables are often standardized so that all variables have equal importance.
In your case, you might choose to standardize all variables except total_sales_per_month so that they have equal variance, and then assign a different and larger variance to total_sales_per_month.
Run PROC STDIZE (a SAS/STAT procedure) twice for the scenario that you describe. Use one run to standardize all variables to have the same variance (standard deviation, scale). Use the second run to specify a MULT= value for just the total_sales_per_month variable so that it has a larger scale.
https://support.sas.com/documentation/onlinedoc/stat/
When you run your cluster analysis, be sure to turn OFF any automatic standardization so that your custom standardization is used.
Have a great week!
Thanks so much.
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Use this tutorial as a handy guide to weigh the pros and cons of these commonly used machine learning algorithms.
Find more tutorials on the SAS Users YouTube channel.