BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
husseinmazaar
Quartz | Level 8

Dears,

 

Kindly note that I need to segment the customers based on behavioral monthly sales and I have 7 variables for clustering and segmenting the customers but some variables are higher than others on importance and weights.This means that if I have Variable called Total_sales_per_month and  this variable has 40% weights (to contribute with 40% in clustering and decisions).

 

I need to know how to apply this in SAS Enterprise miner or sas procedures, does this method correctly or there another way in machine learning with the same idea .

 

Please support me to resolve and satisfy the business needs.  

1 ACCEPTED SOLUTION

Accepted Solutions
MikeStockstill
SAS Employee

Hello husseinmazaar-

 

When computing distances, variables that have a larger variance have a greater contribution to the distance computation.  For that reason, variables are often standardized so that all variables have equal importance.

 

In your case, you might choose to standardize all variables except total_sales_per_month so that they have equal variance, and then assign a different and larger variance to total_sales_per_month.

 

Run PROC STDIZE (a SAS/STAT procedure) twice for the scenario that you describe.  Use one run to standardize all variables to have the same variance (standard deviation, scale).  Use the second run to specify a MULT= value for just the total_sales_per_month variable so that it has a larger scale.

 

https://support.sas.com/documentation/onlinedoc/stat/

 

When you run your cluster analysis, be sure to turn OFF any automatic standardization so that your custom standardization is used.

 

Have a great week!

View solution in original post

2 REPLIES 2
MikeStockstill
SAS Employee

Hello husseinmazaar-

 

When computing distances, variables that have a larger variance have a greater contribution to the distance computation.  For that reason, variables are often standardized so that all variables have equal importance.

 

In your case, you might choose to standardize all variables except total_sales_per_month so that they have equal variance, and then assign a different and larger variance to total_sales_per_month.

 

Run PROC STDIZE (a SAS/STAT procedure) twice for the scenario that you describe.  Use one run to standardize all variables to have the same variance (standard deviation, scale).  Use the second run to specify a MULT= value for just the total_sales_per_month variable so that it has a larger scale.

 

https://support.sas.com/documentation/onlinedoc/stat/

 

When you run your cluster analysis, be sure to turn OFF any automatic standardization so that your custom standardization is used.

 

Have a great week!

sas-innovate-2024.png

Available on demand!

Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.

 

Register now!

How to choose a machine learning algorithm

Use this tutorial as a handy guide to weigh the pros and cons of these commonly used machine learning algorithms.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 2 replies
  • 1463 views
  • 2 likes
  • 2 in conversation