BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
hehe
Calcite | Level 5

Hi there,

I am performing a market basket analysis using PROC ASSOC and PROC SEQUENCE. My data contains information on user sessions (session_id): pages visited (page_type), time spent on each page (time) and money spent on each page (sum).

data sample;
  infile datalines dsd truncover;
  input session_id :13. page_type:$4. sum:8. time:6. n:2.;
datalines4;
30001,6001,10,0.1,1
30001,6001,1,0.4,2
30001,6005,7,3,3
30002,6002,34,0.2,1
30003,6002,2,12,1
30003,6003,5,0.7,2
30003,6002,0.55,3,3
30003,6005,0.9,3,4
;;;;;;;;

Capture1.PNG

The resulting dataset after PROC SEQUENCE has only generated rules (all possible existing in the data sequences of visited pages) and unnecessary at the moment statistics such as count, support, etc.

Capture1.PNG

However, I need to calculate statistics for each sequence such as a sum of time spent (time) and sum of money spent (sum). That is the problem.

Is there a way to do so?

Thank you in advance.

1 ACCEPTED SOLUTION

Accepted Solutions
DougWielenga
SAS Employee

Please note that the direct use of the ASSOC and SEQUENCE procedures which are used by the Association node in SAS Enterprise Miner is not supported.  The only supported approaches to perform Market Basket Analysis are to use one of the following:

   1.  The Market Basket Analysis Node in SAS Enterprise Miner

   2.  The MBANALYSIS procedure available via Visual Data Mining and Machine Learning on SAS Viya

 

For more detail on visualizing your data using SAS Visual Data Mining and Machine Learning, check out the blog at 

 

https://blogs.sas.com/content/sgf/2018/01/17/visualizing-the-results-of-a-market-basket-analysis-in-...

 

Having said that, Market Basket Analysis like Association and Sequence Analysis do not concern themselves with anything other than the occurrence of an event.  It does not even matter if an event happened fifty times at a specific time point or only once.  It does not even matter if the pattern happened fifty times in a transaction nor does it matter what other variables are included in the data since they will be ignored. 

 

What you are asking for relates to rolling-up your data and summarizing certain amounts which can be done with the SUMMARY procedure in many cases.  Here is an example using the first few lines of your data:

 

/*** BEGIN SAS CODE ***/

 

data sample;

    input session_id $ page_type $ sum time n;
cards;
30001 6001 10.00 0.1 1
30001 6001 1.00 0.4 2
30001 6005 7.00 3.0 3
30002 6002 34.00 0.2 1
30003 6002 2.00 12.0 1
30003 6003 5.00 0.7 2
30003 6002 0.55 3.0 3
30003 6005 0.90 3.0 4
;
run;

 

proc print data=sample;
    title 'input data';
run;

 

proc summary data=sample sum;
     by session_id;
     var sum time;
     output out=rollup sum=tot_sum tot_time;
run;


proc print data=rollup;
     title 'output from SUMMARY procedure';
run;

 

title;
run;

 

/*** END SAS CODE ***/

 

which generates the output below my signature. 

 

Hope this helps!

Doug

 

SUMMARY_procedure_results.JPG

 

View solution in original post

1 REPLY 1
DougWielenga
SAS Employee

Please note that the direct use of the ASSOC and SEQUENCE procedures which are used by the Association node in SAS Enterprise Miner is not supported.  The only supported approaches to perform Market Basket Analysis are to use one of the following:

   1.  The Market Basket Analysis Node in SAS Enterprise Miner

   2.  The MBANALYSIS procedure available via Visual Data Mining and Machine Learning on SAS Viya

 

For more detail on visualizing your data using SAS Visual Data Mining and Machine Learning, check out the blog at 

 

https://blogs.sas.com/content/sgf/2018/01/17/visualizing-the-results-of-a-market-basket-analysis-in-...

 

Having said that, Market Basket Analysis like Association and Sequence Analysis do not concern themselves with anything other than the occurrence of an event.  It does not even matter if an event happened fifty times at a specific time point or only once.  It does not even matter if the pattern happened fifty times in a transaction nor does it matter what other variables are included in the data since they will be ignored. 

 

What you are asking for relates to rolling-up your data and summarizing certain amounts which can be done with the SUMMARY procedure in many cases.  Here is an example using the first few lines of your data:

 

/*** BEGIN SAS CODE ***/

 

data sample;

    input session_id $ page_type $ sum time n;
cards;
30001 6001 10.00 0.1 1
30001 6001 1.00 0.4 2
30001 6005 7.00 3.0 3
30002 6002 34.00 0.2 1
30003 6002 2.00 12.0 1
30003 6003 5.00 0.7 2
30003 6002 0.55 3.0 3
30003 6005 0.90 3.0 4
;
run;

 

proc print data=sample;
    title 'input data';
run;

 

proc summary data=sample sum;
     by session_id;
     var sum time;
     output out=rollup sum=tot_sum tot_time;
run;


proc print data=rollup;
     title 'output from SUMMARY procedure';
run;

 

title;
run;

 

/*** END SAS CODE ***/

 

which generates the output below my signature. 

 

Hope this helps!

Doug

 

SUMMARY_procedure_results.JPG

 

Ready to join fellow brilliant minds for the SAS Hackathon?

Build your skills. Make connections. Enjoy creative freedom. Maybe change the world. Registration is now open through August 30th. Visit the SAS Hackathon homepage.

Register today!
How to choose a machine learning algorithm

Use this tutorial as a handy guide to weigh the pros and cons of these commonly used machine learning algorithms.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 1 reply
  • 2533 views
  • 1 like
  • 2 in conversation