BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
husseinmazaar
Quartz | Level 8

Dears,

 

Kindly note that I need to segment the customers based on behavioral monthly sales and I have 7 variables for clustering and segmenting the customers but some variables are higher than others on importance and weights.This means that if I have Variable called Total_sales_per_month and  this variable has 40% weights (to contribute with 40% in clustering and decisions).

 

I need to know how to apply this in SAS Enterprise miner or sas procedures, does this method correctly or there another way in machine learning with the same idea .

 

Please support me to resolve and satisfy the business needs.  

1 ACCEPTED SOLUTION

Accepted Solutions
MikeStockstill
SAS Employee

Hello husseinmazaar-

 

When computing distances, variables that have a larger variance have a greater contribution to the distance computation.  For that reason, variables are often standardized so that all variables have equal importance.

 

In your case, you might choose to standardize all variables except total_sales_per_month so that they have equal variance, and then assign a different and larger variance to total_sales_per_month.

 

Run PROC STDIZE (a SAS/STAT procedure) twice for the scenario that you describe.  Use one run to standardize all variables to have the same variance (standard deviation, scale).  Use the second run to specify a MULT= value for just the total_sales_per_month variable so that it has a larger scale.

 

https://support.sas.com/documentation/onlinedoc/stat/

 

When you run your cluster analysis, be sure to turn OFF any automatic standardization so that your custom standardization is used.

 

Have a great week!

View solution in original post

2 REPLIES 2
MikeStockstill
SAS Employee

Hello husseinmazaar-

 

When computing distances, variables that have a larger variance have a greater contribution to the distance computation.  For that reason, variables are often standardized so that all variables have equal importance.

 

In your case, you might choose to standardize all variables except total_sales_per_month so that they have equal variance, and then assign a different and larger variance to total_sales_per_month.

 

Run PROC STDIZE (a SAS/STAT procedure) twice for the scenario that you describe.  Use one run to standardize all variables to have the same variance (standard deviation, scale).  Use the second run to specify a MULT= value for just the total_sales_per_month variable so that it has a larger scale.

 

https://support.sas.com/documentation/onlinedoc/stat/

 

When you run your cluster analysis, be sure to turn OFF any automatic standardization so that your custom standardization is used.

 

Have a great week!

sas-innovate-2024.png

Don't miss out on SAS Innovate - Register now for the FREE Livestream!

Can't make it to Vegas? No problem! Watch our general sessions LIVE or on-demand starting April 17th. Hear from SAS execs, best-selling author Adam Grant, Hot Ones host Sean Evans, top tech journalist Kara Swisher, AI expert Cassie Kozyrkov, and the mind-blowing dance crew iLuminate! Plus, get access to over 20 breakout sessions.

 

Register now!

How to choose a machine learning algorithm

Use this tutorial as a handy guide to weigh the pros and cons of these commonly used machine learning algorithms.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 2 replies
  • 1442 views
  • 2 likes
  • 2 in conversation