I have a dataset which has all the clicks done on the website in 1 column. I want to find the pattern which gets repeated in the whole data and the data contains more than 1 Million rows and has 17000 different pattern. I also want to know the average time spend on each click for each pattern. I have written a code in SAS which groups each pattern and also finds the time difference between each click but I am not getting the output how I want. For example, according to my code I am getting this output:
Clicks Group Time(Seconds)
A 1 6
B 1 2
C 1 0
D 2 12
E 2 5
F 2 0
A 3 9
B 3 6
C 3 0
H 4 8
I 4 9
J 4 0
Output expected:
Clicks AverageTime Count
ABC A-7.5,B-4,C-0 2
DEF D-12,E-5,F-0 1
HIJ H-8,I-9,J-0 1
Are we guaranteed that every group contains exactly 3 clicks?
Here's an approach, that does the heavy lifting. You can easily expand upon it if more than 3 clicks per pattern are allowed:
data want;
length pattern $ 100;
array duration {3};
recnum = 0;
do until last.group;
set have;
by group;
if first.group then pattern = clicks;
else pattern = catx('|', pattern, clicks);
recnum + 1;
duration{recnum} = time;
end;
keep group pattern duration1-duration3;
run;
This will at least give you the pieces to work with. The rest of the programming would use simple steps like PROC FREQ and PROC MEANS.
April 27 – 30 | Gaylord Texan | Grapevine, Texas
Walk in ready to learn. Walk out ready to deliver. This is the data and AI conference you can't afford to miss.
Register now and lock in 2025 pricing—just $495!
Still thinking about your presentation idea? The submission deadline has been extended to Friday, Nov. 14, at 11:59 p.m. ET.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.