BookmarkSubscribeRSS Feed

Tip: Association Discovery Using SAS Enterprise Miner

Started ‎07-02-2015 by
Modified ‎10-06-2015 by
Views 14,502

This tip is a simple introduction to association analysis using SAS Enterprise Miner. Recently, association analysis has become a very popular user question topic at our data mining community site. I hope this tip will clarify some points and help you understand how the association discovery rules are built.

 

 

Brief Description of Association Discovery:

 

Association discovery, also known as market basket analysis, is the identification of items that occur together in a given event or record. The databases used for online transaction processing systems often provide the data sources for association discovery. Association discovery rules are based on the number of times items occur alone and in combination in the transaction records. Associations can be written in the form A ->B, where A (the left hand side) is called the antecedent and B (the right hand side) is called the consequent. Both sides of an association can contain more than one item. Identifying creditable associations between one or more items can help the business analyst make decisions such as when to distribute coupons, when to put a product on sale, or how to present items in store displays.

 

Example:

 

Let's use the MSEQ data set, in the SAS library SAMPSIO, to identify the association between different actions by creating rules. These rules will then be used to make recommendations (to predict future actions) for each customer. Below you see the first 12 observations in the MSEQ data. It shows how customers take different actions at various times.

 

img1.png

The following SAS Enterprise Miner flow diagram analyzes the SAMPSIO.MSEQ data set.

 

SAS Enterprise Miner Flow Diagram

img2.png

You can run this flow diagram on SAS Enterprise Miner. For this, save the attached xml file, create a new Enterprise Miner project, click on Diagrams, and choose "Import Diagram from XML". The following properties are specified in each node.

 

Market Sequences Node:

 

The data source is created by SAMPSIO.MSEQ and renamed as Market Sequences. The ACTION variable takes the Target role, and the CUSTOMER variable takes the ID role. Note that the TIME variable is dropped from the analysis.

 

Association Node:

 

Association:

  • Maximum Items property: 2 (indicates the maximum size of the item set to be considered in an association)
  • Minimum Confidence Level: 50 (specifies the minimum confidence level to be used to generate a rule)
  • Support Percentage: 10 (specifies the minimum transaction frequency to support an association)

Rules:

  • Export Rule by ID: Yes
  • Recommendation: Yes

 

After you run the Association node, you can view its results by right-clicking the node and selecting Results. In the Results window, you can view the Rules Table by selecting View >> Rules>> Rules Table. The Rules Table (shown below) contains all the created rules along with the related statistics.

 

 

img3.png

 

Notice that Transpose Rule (last column in the Rules Table) contains a value of 1 for all the rules. This implies that all the rules will be used for recommendations. Suppose you are only interested in rules for which the consequent is either buying a new home or opening a new loan. You can interactively choose the related rules by performing the following steps:

  • Close the Results window.
  • Click the three dots for the Rules property (the third row under Train properties) of the Association node to open the Rules Selector table.
  • Highlight all the rows in the table and set the Transpose Value to NO to indicate that none of the created rules are desired.
  • Click the Right Hand of Rule column name to alphabetize the list of consequents.
  • Highlight the rules whose Right Hand of Rule is either new_home or open_loan and set the Transpose Value to YES to indicate them as the desired rules.
  • Click OK to continue. Now you can view the updated results by right-clicking the Association node and selecting Results.

 

Score Node:

 

The Score node uses the model created by the Association node to score on existing data. Here the Score node uses the rules created by the Association Node to recommend items to the customers. In the output data set of the score node, columns are binary variables for each rule and the rows represent customers.  For each rule, a customer is assigned a recommendation value of 1 or 0.  If a customer already has both the antecedent and the consequent of a rule, then the corresponding rule variable takes a value of 0 (rule not recommended). However, if the antecedent of a rule exists, but the consequent does not, then the rule variable takes a value of 1 (rule recommended).

 

SAS Code Node:

 

The SAS Code Node enables you to incorporate your SAS code into the SAS Enterprise Miner process flow diagrams. SAS Code node simplifies the output data set generated by the score code and yields the following table:

 

img4.png

 

Alternative Considerations:

 

The Association node also enables you to perform sequence discovery. Sequence discovery goes one step further than association discovery by taking into account the time of the actions. For example, a hypothetical sequence rule for this analysis could be “25 % of the customers who have a new baby will buy a new car in the next month”. You can do sequence discovery for this analysis by using the TIME variable.

 

You could also perform this analysis by using the Market Basket node. This node does not enable you to do sequence discovery, but it can use the taxonomy data to generate rules at multiple levels. For more information, see the Market Basket node documentation that is accessible through the Help in SAS Enterprise Miner.

Comments

Thanks for very good and detailed instructions! Is there a way how I can schudule assosiation model scoring or at least association rules re-calculation? For example, I want to calculate new recommendation rules once a month or as soon, as new product arrives in our warehouse and some new sales appeared in shopping statistics? 

Version history
Last update:
‎10-06-2015 11:12 AM
Updated by:
Contributors

SAS Innovate 2025: Save the Date

 SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!

Save the date!

Free course: Data Literacy Essentials

Data Literacy is for all, even absolute beginners. Jump on board with this free e-learning  and boost your career prospects.

Get Started

Article Tags