Finding Pattern

I have a dataset which has all the clicks done on the website in 1 column. I want to find the pattern which gets repeated in the whole data and the data contains more than 1 Million rows and has 17000 different pattern. I also want to know the average time spend on each click for each pattern. I have written a code in SAS which groups each pattern and also finds the time difference between each click but I am not getting the output how I want. For example, according to my code I am getting this output:

Clicks    Group     Time(Seconds)

A               1                      6
B               1                      2
C               1                      0
D               2                     12
E               2                       5
F               2                       0
A               3                       9
B               3                       6
C               3                       0
H               4                       8
I                 4                       9
J                4                       0

Output expected:

Clicks         AverageTime      Count

ABC       A-7.5,B-4,C-0            2
DEF       D-12,E-5,F-0             1
HIJ         H-8,I-9,J-0                 1

Are we guaranteed that every group contains exactly 3 clicks?


Here's an approach, that does the heavy lifting.  You can easily expand upon it if more than 3 clicks per pattern are allowed:


data want;

length pattern $ 100;

array duration {3};

recnum = 0;

do until;

   set have;

   by group;

   if then pattern = clicks;

   else pattern = catx('|', pattern, clicks);

   recnum + 1;

   duration{recnum} = time;


keep group pattern duration1-duration3;



This will at least give you the pieces to work with.  The rest of the programming would use simple steps like PROC FREQ and PROC MEANS.

