Turn on suggestions

Auto-suggest helps you quickly narrow down your search results by suggesting possible matches as you type.

Showing results for

- Home
- /
- SAS Communities Library
- /
- Association Discovery — the Apriori algorithm

Options

- RSS Feed
- Mark as New
- Mark as Read
- Bookmark
- Subscribe
- Printer Friendly Page
- Report Inappropriate Content

- Article History
- RSS Feed
- Mark as New
- Mark as Read
- Bookmark
- Subscribe
- Printer Friendly Page
- Report Inappropriate Content

Views
4,536

Association discovery is commonly called ** Market Basket Analysis (MBA)**. MBA is widely used by grocery stores, banks and telecommunications among others. Its results are used to optimise store layouts, design product bundles, plan coupon offers, choose appropriate specials and choose attached mailing in direct marketing. MBA helps us to understand what items are likely to be purchased together. On-line transaction processing systems often provide the data sources for association discovery.

People who buy Toothpaste also tend to buy a toothbrush, right? The marketing team at retail stores should target customers who buy toothpaste and toothbrush also provide an offer to them so that customer buys a third item example mouthwash. If a customer buys toothpaste and toothbrush and sees a discount offer on mouthwash they will be encouraged to spend extra and buy the mouthwash and this is what market analysis is all about. It helps us to understand what items are likely to be purchased together. On-line transaction processing systems often provide the data sources for association discovery.

Typically, a transaction is a single customer purchase, and the items are the things that were bought. Association discovery is the identification of items that occur together in a given event or record. Association rules highlight frequent patterns of associations or causal structures among sets of items or objects in transaction databases. Association discovery rules are based on frequency counts of the number of times items occur alone and in combination in the database. They are expressed as “if item A is part of an event, then item B is also part of the event, X percent of the time.” Thus an association rule is a statement of the form (item set A) ⇒ (item set B).

Example: Customer buys toothpaste (Item A) then the chances of toothbrush (item b) being picked by the customer under the same transaction ID. One thing needs to understand here, this is not a casualty rather it is a co-occurrence pattern.

Above toothpaste is a baby example. If we take real retail stores and they have more than thousands of items. Just imagine how much revenue they can make by using this algorithm with the right placement of items. MBA is a popular algorithm which helps the business make a profit. The above A and B rule were created for two items. It is difficult to create a rule for more than 1000 items that’s where the Associate discovery and apriori algorithm comes to the picture. Let’s see how this algorithm works?

An association rule is written A => B where A is the *antecedent *and B is the *consequent*. Both sides of an association rule can contain more than one item. Techniques used in Association discovery are borrowed from probability and statistics. ** Support, confidence and Lift** are three important evaluation criteria of association discovery.

The level of support is how frequently the combination occurs in the market basket (database). Support is the percentage of baskets (or transactions) that contain both A and B of the association, i.e. % of baskets where the rule is true

Support(A => B) = P(A ∩ B)

This is the probability of the consequent if it was independent of the antecedent. Expected confidence is thus the percentage of occurrences containing B

Expected confidence (A => B) = P(B)

The strength of an association is defined by its confidence factor, which is the percentage of cases in which a consequent appears given that the antecedent has occurred. Confidence is the percentage of baskets having A that also contain B, i.e. % of baskets containing B among those containing A. Note: Confidence(A => B) ≠ Confidence(B => A).

Confidence(A => B) = P(B | A)

Lift

Lift is equal to the confidence factor divided by the expected confidence. Lift is a factor by which the likelihood of consequent increases given an antecedent. Expected confidence is equal to the number of consequent transactions divided by the total number of transactions. Lift is the ratio of the likelihood of finding B in a basket known to contain A, to the likelihood of finding B in any random basket.

**Example: Shoes and Socks**

If a customer buys shoes, then 10% of the time he also buys socks. This example rule has a left-hand side (antecedent) and a right-hand side (consequent). Shoes are the antecedent item and socks is the consequent item.

Apriori algorithm, a classic algorithm, is useful in mining frequent itemsets and relevant association rules. Usually, this algorithm works on a database containing a large number of transactions.

*k-itemset** *: a set of *k *items.

e.g. {beer, diapers, juice} is a 3-itemset; {cheese} is a 1-itemset; {honey, ice-cream} is a 2-itemset

** Support**: an itemset has support, say, 10% if 10% of the records in the database contain those items.

** Minimum support**: The Apriori algorithm starts a specified minimum level of support, and focuses on itemsets with at least this level.

Consider a lattice containing all possible combinations of only 5 products:

**A** = apples, **B**= beer, **C** = cider, **D** = diapers & **E** = earbuds

Build your skills. Make connections. Enjoy creative freedom. Maybe change the world. **Registration is now open through August 30th**. Visit the SAS Hackathon homepage.

Data Literacy is for **all**, even absolute beginners. Jump on board with this free e-learning and boost your career prospects.

Article Labels

Article Tags