Demystifying Support-Lift: The Math Behind Meaningful Association Rules

Association rules are assessed using key evaluation metrics such as support, confidence, and lift, which help determine the strength and usefulness of the discovered patterns. For a deeper understanding of how association rules are generated using the Frequent Pattern Growth algorithm in SAS Viya, you may refer to the accompanying post.

In simple market basket analysis, a common issue is that significant and potentially valuable associations can often go unnoticed when rules are generated solely from transaction or point-of-sale data. Hence at times it is important to combine transaction data with item taxonomy. The MBANALYSIS procedure in SAS Viya supports taxonomy data and generates rules at multiple levels in taxonomy.

Another common challenge in simple association rule mining is the generation of a large number of rules—many of which are either obvious or uninteresting. This is a well-known limitation: setting a high support threshold results in fewer rules, but they often lack novelty or insight. Conversely, setting the support threshold too low leads to an overwhelming number of rules, requiring domain experts to sift through them to find those of real value. To address this, the MBANALYSIS procedure calculates a metric called 'Support Lift', which helps highlight more meaningful rules. It does so by measuring how much a rule's actual support deviates from its expected support, which is derived from the support of parents of the items in the rule. A greater deviation suggests a higher degree of "surprise", indicating that the rule may reveal a less obvious and potentially more interesting pattern. Note that Support Lift is calculated only when the hierarchy data is specified.

In this post, I will focus on explaining how 'Support Lift' is calculated. While this metric is automatically computed by the software (specifically, PROC MBANALYSIS in SAS Viya), I will walk through the steps as if performing the calculation manually to aid understanding.

To better understand the calculations, let’s consider an example from a grocery store setting. For simplicity, we'll assume that the lowest level of the product taxonomy corresponds to the actual items found in customers' shopping baskets. The item taxonomy for this example is shown below.

Select any image to see a larger version.
Mobile users: To view the images, select the "Full" version at the bottom of the page.

And here is the snapshot of the transaction data:

Refer to my post for SAS code examples, access to transaction data, and details on incorporating hierarchical data to understand how association rules are generated using the MBANALYSIS procedure.

The following statements run the association analysis on the casuser.ProdSales data table:

proc mbanalysis data=casuser.ProdSales items=3 support=1;

     output out=casuser.out outfreq=casuser.outfreq outrule=casuser.outrule;

   customer Customer;

   target Item;

   hierarchy data = casuser.BaseLevel casuser.ParentLevel;

   run;

The successful execution of code produces several output tables. A snapshot of the OUTRULE table (sorted on Lift values) is displayed below:

Table 1

Now, let’s see how 'Support Lift' (SUPLIFT) is calculated?

Calculation of Support-Lift

'Support Lift' is computed as the deviation from the actual support to the estimated support of LHS (left-hand side) and RHS (right-hand side) items in the rule.

Suppose the rule items are (A,B,C,...,K), and suppose that (A^, B^, ..., G^) are the immediate ancestors of a subset of those items. Then the Estimated Support EST_SUP(A,B,C,...,K) is calculated as follows:

EST_SUP(A,B,C,...,K) = SUP(A^,B^,...,G^,H,...K) * (SUP(A)/SUP(A^)) * (SUP(B)/SUP(B^))* ................. *SUP(G)/SUP(G^)), where SUP() is the actual support.

And 'Support Lift' is calculated as :

SUP_LIFT(A,B,C,..,K) ={SUP(A,B,C ...K) / EST_SUP(A,B,C ...K)} - 1.

If the 'Support Lift' is large, the rule is more significant. If the 'Support Lift' is close to zero, then the rule carries no extra information, and can be replaced by a rule that contains the parent items.

Let’s look at the first rule, “Breakfast & Dinner ⇒ Cheddar Cheese", from the OUTRULE table in our ongoing example and break down the values presented. For this rule, the SUPLIFT value is shown as 3.5. We’ll now walk through the manual steps to understand how this value was calculated. To calculate 'Support Lift', we first need to determine the Estimated Support (EST_SUP). To proceed we need to know the immediate ancestors of Breakfast, Dinner and Cheddar Cheese. From the taxonomy diagram you can determine that the immediate ancestor of Breakfast and Dinner is Frozen Foods and for 'Cheddar Cheese' its Cheese. So, I simplify the formula for Estimated Support as follows:

EST_SUP(Breakfast, Dinner, Cheddar Cheese)= SUP(Ancestor of Breakfast, Ancestor of Dinner, Ancestor of Cheddar Cheese) * (SUP(Breakfast)/SUP(Ancestor of Breakfast)) * (SUP(Dinner)/SUP(Ancestor of Dinner))* (SUP(Cheddar Cheese)/SUP(Ancestor of Cheddar Cheese))

= SUP(Frozen Foods, Cheese)*(SUP(Breakfast)/SUP(Frozen Foods))* (SUP(Dinner)/SUP(Frozen Foods))* (SUP(Cheddar Cheese)/SUP(Cheese))

= 100*(33.33/100)* (66.67/100)* (33.33/100) = 7.406

You might be wondering where these values came from. All of them are available in the output tables generated by the MBANALYSIS procedure. For example, the OUT table (a portion of which is shown below) provides the support for the item set (Frozen Foods, Cheese), which is listed as 100.

To determine the support of other individual items as required in the formula, refer to the OUTFREQ table. Here is an excerpt from the OUTFREQ table:

Finally, 'Support Lift' of the rule Breakfast & Dinner ⇒ Cheddar Cheese can be calculated as:

SUP_LIFT(Breakfast, Dinner, Cheddar Cheese) =

{SUP(Breakfast, Dinner, Cheddar Cheese) / EST_SUP(Breakfast, Dinner, Cheddar Cheese)} – 1

The value of Support (Breakfast, Dinner, Cheddar Cheese) is already available in table 1.

SUP_LIFT= {(33.33/7.406)} -1 = 3.5

And the same value (3.5) of 'Support Lift' is listed in the OUTRULE table (table1) for this rule.

The substantial difference between the actual support (33.33%) and the estimated support (7.4%) indicates that items Breakfast, Dinner and Cheddar Cheese co-occur far more frequently than would be expected by chance. This large gap implies that the strong association between these items goes beyond what taxonomy would suggest.

I hope this effort to simplify the calculations helps make your learning journey smoother.

References:

SAS Documentation

Find more articles from SAS Global Enablement and Learning here.

Demystifying Support-Lift: The Math Behind Meaningful Association Rules

Calculation of Support-Lift

Registration is open

SAS AI and Machine Learning Courses