In my previous post, I discussed how to perform association rule mining on a transaction database using the MBANALYSIS procedure. As a next step you would be interested in using this model to score the new data. Note that the only scoring recipe that is obtained from the MBANALYSIS procedure is in the form of analytic store table. This analytic store can then be used with the ASTORE procedure to score new data.
In this post, I will walk you through the steps that can simplify and streamline the scoring process. The scoring process using PROC ASTORE is not straightforward and requires some data pre-processing to obtain desired results. Refer to the code below extracted from my previous post that is used to run the association analysis on the transaction data set ProdSales.
proc mbanalysis data=casuser.ProdSales items=2 support=1;
output out=casuser.out outfreq=casuser.outfreq outrule=casuser.outrule;
customer Customer;
target Item;
hierarchy data = casuser.BaseLevel casuser.ParentLevel;
savestate rstore=casuser.StateArm;
run;
Notice the RSTORE= option in the SAVESTATE statement that creates analytic store for the model and saves it as a binary object in a data table named StateArm. I will be using this analytic store in the ASTORE procedure to score new data. The demo to follow assumes that the model has been created successfully using above code and the analytic store has been saved. Recall that this analysis resulted in 128 rules. The information about the rules is produced below for reference:
Select any image to see a larger version.
Mobile users: To view the images, select the "Full" version at the bottom of the page.
cas;
caslib _ALL_ assign;
data casuser.Transaction;
length Customer $ 10 Item $ 20 ;
retain Customer;
customer='A';
Item='Low Fat Milk'; output;
Item='Chicken'; output;
Item='Cake'; output;
Item='Frozen Pizza'; output;
Item='Ice Cream'; output;
Item='Curd'; output;
customer='B';
Item='Chips'; output;
Item='Swiss Cheese'; output;
Item='Frozen Pizza'; output;
Item='Pancakes'; output;
customer='C';
Item='Skim Milk'; output;
Item='Cheddar Cheese'; output;
Item='Oatmeal'; output;
Item='Cake'; output;
Customer='D';
Item='Frozen Pizza'; output;
Item='Chicken'; output;
Item='Ice Cream';output;
Item='Eggs'; output;
run;
The picture below shows the snapshot of Transaction table.
Now this transaction table cannot be used in its current form for the ASTORE procedure. Why is that? Because the transactional data table is in long format and PROC ASTORE accepts input in wide format. So, the input data format must be converted from long to wide before the data are provided as input to PROC ASTORE. To do this, I use TRANSPOSE procedure but do not forget to sort the data before transposing it.
proc sort data=CASUSER.TRANSACTION out=WORK.SORTed;
by Customer;
run;
proc transpose data=WORK.SORTed out=CASUSER.Transpose1 prefix=Item_;
var Item;
by Customer;
run;
The successful completion of run produces a table in wide format as below
proc astore;
score data=casuser.Transpose1
out=casuser.scored1
rstore=casuser.StateArm;
quit;
The PROC ASTORE statement invokes the procedure and does not require any options. The SCORE statement enables you to score the data using the previously trained model. The DATA= casuser.Transpose1 in SCORE statement names the input data table for PROC ASTORE to use. The OUT= casuser.scored1 specifies the output data table. The RSTORE=casuser.StateArm specifies the table that contains the analytic store used for scoring the data. The successful execution of the code generates a scored table in a wide format similar to the following:
The scoring model appears to list all 128 rule indices of the association rules as columns (id1 through id128), assigning the corresponding "RULEID" values from the rule set to them.
proc sort data=CASUSER.SCORED1 out=WORK.TableSorted;
by Customer;
run;
proc transpose data=WORK.TableSorted out=casuser.Scored_Transaction (drop=_LABEL_ _NAME_ rename=(RuleID1=RuleID)) prefix=RuleID ;
var id1-id128;
by Customer;
run;
The resulting table now contains two columns and 512 rows (128*4=512), i.e. 128 rules times 4, the number of customers.
All 128 rules are mapped to each customer in the transaction data. For any rule that applies based on the items in the basket, the scoring algorithm assigns the corresponding RuleID value. If a rule does not apply to a customer based on the items in their basket, the algorithm returns a value of zero.
For example, in first row the RuleID column contains a value 1, which means it corresponds to the RuleID 1 that represents a rule “Dessert --> Cake”. You can make use of these rules for recommendations (cross-sell/ up-sell) knowing the items already purchased by the customers. Of course, these recommendations should be based on metrics like support, lift, support-lift and confidence because not all rules are useful.
You will see several zero values in the RuleID column for customer A. This indicates that many of the 128 rules, generated by the previously trained model, do not match given the items in customer A’s basket. In row 130, for customer B, the RuleID column contains the value 4, indicating it corresponds to RuleID 4, which represents the rule "Breakfast → Dinner."
A similar interpretation can be applied to the other entries in the scored table.
Note: - You may want to reference the OutRule table which is reproduced below for convenience.
You can utilize the scored table and the values in the RuleID column to identify the appropriate rule for making product recommendations. When there are multiple rules with the same item on the LHS, prioritize the recommendations based on metrics such as lift, support-lift and confidence.
Find more articles from SAS Global Enablement and Learning here.
April 27 – 30 | Gaylord Texan | Grapevine, Texas
Walk in ready to learn. Walk out ready to deliver. This is the data and AI conference you can't afford to miss.
Register now and lock in 2025 pricing—just $495!
The rapid growth of AI technologies is driving an AI skills gap and demand for AI talent. Ready to grow your AI literacy? SAS offers free ways to get started for beginners, business leaders, and analytics professionals of all skill levels. Your future self will thank you.