BookmarkSubscribeRSS Feed
vietlinh12hoa
Obsidian | Level 7

I've got the table 

 

Sales code Date Product Code
AAAAAAAAA 01/01/2021 DEFGH102
AAAAAAAAA 01/01/2021 IJKLM202
AAAAAAAAA 01/01/2021 NUJL303
BBBBBBBBB 01/02/2021 ZXCVBN123
BBBBBBBBB 01/02/2021 ASDFGH123

 

I would like to do market basket analysis. I've tried:

 

proc mbanalysis data=cas.mydata
   pctsupport=1;
   customer 'sales code'n;
   target 'product code'n;
   output out=cas.mba_result;
run;

But it returns an empty table. This is the first time I've tried MBA in SAS, what could be the error here?

6 REPLIES 6
sbxkoenk
SAS Super FREQ

Hello,

 

In SAS Studio Tasks, go to SAS VIYA MACHINE LEARNING > Unsupervised Learning > Market Basket Analysis

There you can write code with point-and-click.

 

Here's your code:

libname mycas cas caslib='casuser';

data work.mba;
LENGTH Sales_code $ 10 Date $ 10 Product_Code $ 10;
infile cards delimiter='|';
input Sales_code $ Date $ Product_Code $;
RealDate=input(Date,ddmmyy10.); /* assuming European date (in)format */
format RealDate date9.;
cards;
AAAAAAAAA|01/01/2021|DEFGH102
AAAAAAAAA|01/01/2021|IJKLM202
AAAAAAAAA|01/01/2021|NUJL303
BBBBBBBBB|01/02/2021|ZXCVBN123
BBBBBBBBB|01/02/2021|ASDFGH123
;
run;

data mycas.mba; set work.mba; run;

ods noproctitle;

proc mbanalysis data=mycas.MBA conf=50 pctsupport=1;
	target Product_Code;
	customer Sales_code;
	output out=mycas.frequent_item_sets_table 
	   outfreq=mycas.unique_frequent_items_table 
       outrule=mycas.rule_table;
	savestate rstore=mycas.scoring_model;
run;
/* end of program */

Koen

vietlinh12hoa
Obsidian | Level 7

Thanks. The scoring_model provide some weird texts in _state_. I'm not sure how to elaborate? Any chance I can use Enterprise Guide to visualize the MBA analysis (like bubble plot)?

 

Also, in the rule_table when the result is:

LHS RHS SUPPORT CONF LIFT ITEM1 ITEM2 ITEM3 RULE
2 1 1 85 1.2 A B C A& B==> C

Does this mean if A & B are being bought then 85% confidence C will be being purchased too? How can we elaborate Support and Lift number here?

sbxkoenk
SAS Super FREQ

Hello,

 

Can you access SAS VIYA with your Enterprise Guide?

If so, you can of course visualize the MBA results the way you want.

SAS Studio Flows may be an alternative though to your EGuide.

 

You are right about "85% confidence C will be being purchased too", but I like the lift measure better.

Because the rule ^(A & B) ==> C (which says NOT (A & B) ==> C) can have an even higher confidence than 85%.

So I find confidence misleading sometimes.

 

What do you mean with:

> How can we elaborate Support and Lift number here?

?

 

Cheers,

Koen

vietlinh12hoa
Obsidian | Level 7
I mean if the Lift = 1.2 in the output. Does this mean if A&B are present, then 20% more likely C is to be purchased?
sbxkoenk
SAS Super FREQ

No.

That's not what it means, but both item sets (left and right of ═►) are positively associated, that is definitely the case!

 

Read here:

The interpretation of the implication (═►) in association rules is precarious. High confidence and support does not imply cause and effect. The rule is not necessarily interesting. The two items (A&B and C) might not even be correlated. The term confidence is not related to the statistical usage; therefore, there is no repeated sampling interpretation.

 

That' why you should be prudent with a confidence of 0.83 (83%) for example.

The confidence of the "negated" rule might even be higher.
[ If (A) ═► (B) is the rule, then I call (NOT A) ═► (B) the negated rule. ]

Illustration with an example: 

Consider the association rule (A) ═► (B). This rule has for example 50% support and 83% confidence. Based on these two measures, this might be considered a strong rule. On the contrary, it's possible that those without (A) are even more likely to have (B) (for example 87.5%). A and B are thus in fact negatively correlated.

 

The lift of the rule A ═► B is the confidence of the rule divided by the expected confidence, assuming that the item sets are independent. The lift can be interpreted as a general measure of association between the two item sets. Values greater than 1 indicate positive correlation; values equal to 1 indicate zero correlation; and values less than 1 indicate negative correlation. Note that lift is symmetric. That is, the lift of the rule A ═► B is the same as the lift of the rule B ═► A.

 

Support and lift are symmetric, confidence is not!

 

Hope this clarifies things a bit,

Koen

vietlinh12hoa
Obsidian | Level 7
Thanks for the clarification. I will need to research MBA more deeper. So in term of business sense (talk to business people), we can say C is strongly purchased by intention when A&B are both present?

This also means LIFT and SUPPORT should be the main metrics. like LIFT > 1 to indicate the intention correlation, while SUPPORT > threshold means the number of sampling is significant. The confidence is less important, we just need this surpass some small enough threshold.

SAS Innovate 2025: Call for Content

Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!

Submit your idea!

How to choose a machine learning algorithm

Use this tutorial as a handy guide to weigh the pros and cons of these commonly used machine learning algorithms.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 6 replies
  • 1107 views
  • 0 likes
  • 2 in conversation