Demand-Based Store Clustering by Example with SAS Enterprise Miner: Part 1

2 Likes

Along the same lines as the Learn by Example with SAS® Enterprise Miner series, this tip explains how to apply the cluster analysis technique to a real-world example using process flow diagrams.

In general, the clustering technique is used to perform observation grouping, which can be used to segment databases. Clustering places objects into groups, or clusters, suggested by the data. The objects in each cluster tend to be similar to each other in some sense, and objects in different clusters tend to be dissimilar. If obvious clusters or groupings could be developed prior to the analysis, then the clustering analysis could be performed by simply sorting the data.

For more information on how clustering works, please watch this video:

Why demand mix store clustering? Here's a retail example.

Approximately 20 percent of a retailer’s categories generate 80 percent of its revenue. Historically, companies have relied on imperfect volume-based store clustering or geographically-based approaches which limit their ability to focus assortments for local customers. Moreover, research shows that stores with the same volume or geographic proximity may have very different sales patterns and customer profiles. Traditional clustering methods do not sufficiently identify localized trends.

This suggested approach benefits retailers by using statistical analysis to identify all significant and distinct buying patterns in a market. This method is purposeful and based primarily on demand and secondarily (in a separate step) defines their demographic profile. It is important to note that by separating these steps, you ensure that clusters are created purely on customer demand of products. This concept will increase sales, margins and inventory productivity by aligning local assortments with local demand.

What data is required?

Historical sales data at store & sku level. You can use unit, margin or revenue dollar sales for a particular product category.
Product descriptive data (i.e. Color, Fabric, Collar, Silhouette, Price Range, etc.)
Store Attributes – Format, Climate, Size, Demographic (Population, Household Income, Average Age of Household)

How to prep the data

It is necessary to transpose the sales data by product ID and use the product names as labels. This allows you to better understand the product descriptions in charts within SAS Enterprise Miner. Also, for the clustering node to perform without errors the null values should be replaced with zeros. The following code demonstrates how this might work.

 /*-------------------------------------------------------------------*/
 /*       Store Product Clustering Example                            */
 /*       Copyright(c) December 2015                                  */
 /*       by SAS Institute Inc., Cary, NC, USA                        */
 /*-------------------------------------------------------------------*/
 /* This material is provided "as is" by SAS Institute Inc.  There    */
 /* are no warranties, expressed or implied, as to merchantability or */
 /* fitness for a particular purpose regarding the materials or code  */
 /* contained herein. The Institute is not responsible for errors     */
 /* in this material as it now exists or will exist, nor does the     */
 /* Institute provide technical support for it.                       */
 /*-------------------------------------------------------------------*/
 /* First sort data by store number, note:  the use of a TEST libref */
 PROC SORT;
	DATA=TEST.SALES_DATA(KEEP=sales_units PROD_NAME PROD_ID store_number);
	OUT=TEST.SORTTempTableSorted;
	BY store_number;
 RUN;
/* Next, transpose the data and apply product name as labels */
PROC TRANSPOSE DATA=TEST.SALES_DATA2;
	OUT=TEST.SALES_DATA;
	BY store_number;
	ID PROD_ID ;
	idlabel PROD_NAME;
	VAR sales_units;
RUN; 
QUIT;
/* Also, its important to Loop through data and replace null values with zeros */
data TEST.SALES_DATA2;
   set TEST.SALES_DATA2;
   array change _numeric_;
         do over change;
         if change=. then change=0;
         end;
run;

Additionally, you might consider removing outliers as they could appear as individual clusters. Also, you might want to consider standardizing the data so that the distances are of equal practical importance. The Transform Variables node in SAS Enterprise Miner could be used to standardize the variables.

Descriptive statistics

Now that we have created pure demand-based clusters, the next step is to quantitatively describe the main features of the collection of information on each cluster using new attributes. This example leverages both the StatExplore and Decision Tree nodes in SAS Enterprise Miner.

In the above diagram, we join the new cluster label data from Enterprise Miner Diagram 1 and merge this with additional sales, product, store and demographic information. When incorporating additional data we can better understand each cluster. By using the SAS Code Node you can set each individual cluster label as the “Target” variable.That way we can use the Decision Tree and StatExplore node to find the variables that best describe each cluster.

This concludes Part 1 of this Tech Tip. In the second installment, we will discuss options regarding how to further investigate the product and store level attributes in a more balanced and useful way.

fox-nicole · ‎01-08-2018

Was the Part 2 of this series ever published? If so, I'm unable to locate it.