This tip is a simple introduction to association analysis using SAS Enterprise Miner. Recently, association analysis has become a very popular user question topic at our data mining community site. I hope this tip will clarify some points and help you understand how the association discovery rules are built.
Brief Description of Association Discovery:
Association discovery, also known as market basket analysis, is the identification of items that occur together in a given event or record. The databases used for online transaction processing systems often provide the data sources for association discovery. Association discovery rules are based on the number of times items occur alone and in combination in the transaction records. Associations can be written in the form A ->B, where A (the left hand side) is called the antecedent and B (the right hand side) is called the consequent. Both sides of an association can contain more than one item. Identifying creditable associations between one or more items can help the business analyst make decisions such as when to distribute coupons, when to put a product on sale, or how to present items in store displays.
Let's use the MSEQ data set, in the SAS library SAMPSIO, to identify the association between different actions by creating rules. These rules will then be used to make recommendations (to predict future actions) for each customer. Below you see the first 12 observations in the MSEQ data. It shows how customers take different actions at various times.
The following SAS Enterprise Miner flow diagram analyzes the SAMPSIO.MSEQ data set.
SAS Enterprise Miner Flow Diagram
You can run this flow diagram on SAS Enterprise Miner. For this, save the attached xml file, create a new Enterprise Miner project, click on Diagrams, and choose "Import Diagram from XML". The following properties are specified in each node.
Market Sequences Node:
The data source is created by SAMPSIO.MSEQ and renamed as Market Sequences. The ACTION variable takes the Target role, and the CUSTOMER variable takes the ID role. Note that the TIME variable is dropped from the analysis.
After you run the Association node, you can view its results by right-clicking the node and selecting Results. In the Results window, you can view the Rules Table by selecting View >> Rules>> Rules Table. The Rules Table (shown below) contains all the created rules along with the related statistics.
Notice that Transpose Rule (last column in the Rules Table) contains a value of 1 for all the rules. This implies that all the rules will be used for recommendations. Suppose you are only interested in rules for which the consequent is either buying a new home or opening a new loan. You can interactively choose the related rules by performing the following steps:
The Score node uses the model created by the Association node to score on existing data. Here the Score node uses the rules created by the Association Node to recommend items to the customers. In the output data set of the score node, columns are binary variables for each rule and the rows represent customers. For each rule, a customer is assigned a recommendation value of 1 or 0. If a customer already has both the antecedent and the consequent of a rule, then the corresponding rule variable takes a value of 0 (rule not recommended). However, if the antecedent of a rule exists, but the consequent does not, then the rule variable takes a value of 1 (rule recommended).
SAS Code Node:
The SAS Code Node enables you to incorporate your SAS code into the SAS Enterprise Miner process flow diagrams. SAS Code node simplifies the output data set generated by the score code and yields the following table:
The Association node also enables you to perform sequence discovery. Sequence discovery goes one step further than association discovery by taking into account the time of the actions. For example, a hypothetical sequence rule for this analysis could be “25 % of the customers who have a new baby will buy a new car in the next month”. You can do sequence discovery for this analysis by using the TIME variable.
You could also perform this analysis by using the Market Basket node. This node does not enable you to do sequence discovery, but it can use the taxonomy data to generate rules at multiple levels. For more information, see the Market Basket node documentation that is accessible through the Help in SAS Enterprise Miner.