I've got the table
Sales code | Date | Product Code |
AAAAAAAAA | 01/01/2021 | DEFGH102 |
AAAAAAAAA | 01/01/2021 | IJKLM202 |
AAAAAAAAA | 01/01/2021 | NUJL303 |
BBBBBBBBB | 01/02/2021 | ZXCVBN123 |
BBBBBBBBB | 01/02/2021 | ASDFGH123 |
I would like to do market basket analysis. I've tried:
proc mbanalysis data=cas.mydata
pctsupport=1;
customer 'sales code'n;
target 'product code'n;
output out=cas.mba_result;
run;
But it returns an empty table. This is the first time I've tried MBA in SAS, what could be the error here?
Hello,
In SAS Studio Tasks, go to SAS VIYA MACHINE LEARNING > Unsupervised Learning > Market Basket Analysis
There you can write code with point-and-click.
Here's your code:
libname mycas cas caslib='casuser';
data work.mba;
LENGTH Sales_code $ 10 Date $ 10 Product_Code $ 10;
infile cards delimiter='|';
input Sales_code $ Date $ Product_Code $;
RealDate=input(Date,ddmmyy10.); /* assuming European date (in)format */
format RealDate date9.;
cards;
AAAAAAAAA|01/01/2021|DEFGH102
AAAAAAAAA|01/01/2021|IJKLM202
AAAAAAAAA|01/01/2021|NUJL303
BBBBBBBBB|01/02/2021|ZXCVBN123
BBBBBBBBB|01/02/2021|ASDFGH123
;
run;
data mycas.mba; set work.mba; run;
ods noproctitle;
proc mbanalysis data=mycas.MBA conf=50 pctsupport=1;
target Product_Code;
customer Sales_code;
output out=mycas.frequent_item_sets_table
outfreq=mycas.unique_frequent_items_table
outrule=mycas.rule_table;
savestate rstore=mycas.scoring_model;
run;
/* end of program */
Koen
Thanks. The scoring_model provide some weird texts in _state_. I'm not sure how to elaborate? Any chance I can use Enterprise Guide to visualize the MBA analysis (like bubble plot)?
Also, in the rule_table when the result is:
LHS | RHS | SUPPORT | CONF | LIFT | ITEM1 | ITEM2 | ITEM3 | RULE |
2 | 1 | 1 | 85 | 1.2 | A | B | C | A& B==> C |
Does this mean if A & B are being bought then 85% confidence C will be being purchased too? How can we elaborate Support and Lift number here?
Hello,
Can you access SAS VIYA with your Enterprise Guide?
If so, you can of course visualize the MBA results the way you want.
SAS Studio Flows may be an alternative though to your EGuide.
You are right about "85% confidence C will be being purchased too", but I like the lift measure better.
Because the rule ^(A & B) ==> C (which says NOT (A & B) ==> C) can have an even higher confidence than 85%.
So I find confidence misleading sometimes.
What do you mean with:
> How can we elaborate Support and Lift number here?
?
Cheers,
Koen
No.
That's not what it means, but both item sets (left and right of ═►) are positively associated, that is definitely the case!
Read here:
The interpretation of the implication (═►) in association rules is precarious. High confidence and support does not imply cause and effect. The rule is not necessarily interesting. The two items (A&B and C) might not even be correlated. The term confidence is not related to the statistical usage; therefore, there is no repeated sampling interpretation.
That' why you should be prudent with a confidence of 0.83 (83%) for example.
The confidence of the "negated" rule might even be higher.
[ If (A) ═► (B) is the rule, then I call (NOT A) ═► (B) the negated rule. ]
Illustration with an example:
Consider the association rule (A) ═► (B). This rule has for example 50% support and 83% confidence. Based on these two measures, this might be considered a strong rule. On the contrary, it's possible that those without (A) are even more likely to have (B) (for example 87.5%). A and B are thus in fact negatively correlated.
The lift of the rule A ═► B is the confidence of the rule divided by the expected confidence, assuming that the item sets are independent. The lift can be interpreted as a general measure of association between the two item sets. Values greater than 1 indicate positive correlation; values equal to 1 indicate zero correlation; and values less than 1 indicate negative correlation. Note that lift is symmetric. That is, the lift of the rule A ═► B is the same as the lift of the rule B ═► A.
Support and lift are symmetric, confidence is not!
Hope this clarifies things a bit,
Koen
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Use this tutorial as a handy guide to weigh the pros and cons of these commonly used machine learning algorithms.
Find more tutorials on the SAS Users YouTube channel.