Hi All,
I'm working with website tracking data. and my dataset looks like the following:
Member ID | Website ID | Website URL | Page View Start Time |
3304 | 6747 | www.google.com | 01JUN2012:05:32:00 |
3304 | 6747 | www.google.com | 01JUN2012:05:32:00 |
3304 | 6990 | en.wikipedia.org | 01JUN2012:05:32:00 |
3304 | 6747 | www.google.com | 01JUN2012:05:39:00 |
3304 | 6747 | www.google.com | 01JUN2012:07:28:00 |
3304 | 6747 | www.google.com | 01JUN2012:07:28:00 |
3304 | 7254 | slickdeals.net | 01JUN2012:07:28:00 |
3304 | 47814 | www.wpxi.com | 01JUN2012:07:28:00 |
3304 | 6747 | www.google.com | 01JUN2012:07:29:00 |
3304 | 6747 | www.google.com | 01JUN2012:07:29:00 |
3304 | 6990 | en.wikipedia.org | 01JUN2012:07:29:00 |
5569 | 7254 | slickdeals.net | 01JUN2012:07:29:00 |
5569 | 7254 | slickdeals.net | 01JUN2012:07:30:00 |
5569 | 6747 | www.google.com | 01JUN2012:08:15:00 |
5569 | 6747 | www.google.com | 01JUN2012:08:15:00 |
5569 | 6748 | www.facebook.com | 01JUN2012:08:16:00 |
5569 | 6748 | www.facebook.com | 01JUN2012:08:16:00 |
5569 | 6747 | www.google.com | 01JUN2012:09:14:00 |
5569 | 6747 | www.google.com | 01JUN2012:09:14:00 |
5569 | 6747 | www.google.com | 01JUN2012:09:14:00 |
I want a data set that looks like this:
Member ID | Website ID | Website URL | Page View Start Time | Sequence |
3304 | 6747 | www.google.com | 01JUN2012:05:32:00 | 1 |
3304 | 6747 | www.google.com | 01JUN2012:05:32:00 | 1 |
3304 | 6990 | en.wikipedia.org | 01JUN2012:05:32:00 | 2 |
3304 | 6747 | www.google.com | 01JUN2012:05:39:00 | 3 |
3304 | 6747 | www.google.com | 01JUN2012:07:28:00 | 3 |
3304 | 6747 | www.google.com | 01JUN2012:07:28:00 | 3 |
3304 | 7254 | slickdeals.net | 01JUN2012:07:28:00 | 4 |
3304 | 47814 | www.wpxi.com | 01JUN2012:07:28:00 | 5 |
3304 | 6747 | www.google.com | 01JUN2012:07:29:00 | 6 |
3304 | 6747 | www.google.com | 01JUN2012:07:29:00 | 6 |
3304 | 6990 | en.wikipedia.org | 01JUN2012:07:29:00 | 7 |
5569 | 7254 | slickdeals.net | 01JUN2012:07:29:00 | 1 |
5569 | 7254 | slickdeals.net | 01JUN2012:07:30:00 | 1 |
5569 | 6747 | www.google.com | 01JUN2012:08:15:00 | 2 |
5569 | 6747 | www.google.com | 01JUN2012:08:15:00 | 2 |
5569 | 6748 | www.facebook.com | 01JUN2012:08:16:00 | 3 |
5569 | 6748 | www.facebook.com | 01JUN2012:08:16:00 | 3 |
5569 | 6747 | www.google.com | 01JUN2012:09:14:00 | 4 |
5569 | 6747 | www.google.com | 01JUN2012:09:14:00 | 4 |
5569 | 6747 | www.google.com | 01JUN2012:09:14:00 | 4 |
Where sequence is a counter that resets for every member_id website_id page_view_start_time combination, but does not get reset if the start time changes within the same website.
In the future i would like to build in a a control that would allow me to say if you spent more than 20 minutes idle on google and then clicked on google again then that would reset the sequence counter back to 1.
This is a good one! I've been messing around with first. last. processing all morning but can't quite get the results i want.
Thank you all in advance.
I would like a DoW in a case like this.
/* test data */
data one;
input member site start :anydtdtm19.;
obs = _n_;
cards;
3304 6747 01JUN2012:05:32:00
3304 6747 01JUN2012:05:32:00
3304 6990 01JUN2012:05:32:00
3304 6747 01JUN2012:05:39:00
3304 6747 01JUN2012:07:28:00
3304 6747 01JUN2012:07:28:00
3304 7254 01JUN2012:07:28:00
3304 47814 01JUN2012:07:28:00
3304 6747 01JUN2012:07:29:00
3304 6747 01JUN2012:07:29:00
3304 6990 01JUN2012:07:29:00
5569 7254 01JUN2012:07:29:00
5569 7254 01JUN2012:07:30:00
5569 6747 01JUN2012:08:15:00
5569 6747 01JUN2012:08:15:00
5569 6748 01JUN2012:08:16:00
5569 6748 01JUN2012:08:16:00
5569 6747 01JUN2012:09:14:00
5569 6747 01JUN2012:09:14:00
5569 6747 01JUN2012:09:14:00
1111 1111 01JAN2013:01:00:00 <- new data
1111 1111 01JAN2013:01:20:01
1111 1111 01JAN2013:01:21:00
1111 2222 01JAN2013:01:41:01
;
run;
/* create sequence */
%let timeout = %sysevalf(20*60); /* 20 minutes */
data two;
seq = 0;
/* DoW */
do until (last.member);
set one;
by member site notsorted;
drop lagSite difStart;
lagSite = lag(site);
difStart = dif(start);
seq + (first.site | (lagSite=site & difStart>&timeout.));
output;
end;
run;
proc print data=two noobs;
var member site start seq;
format start datetime.;
run;
/* on lst
member site start seq
3304 6747 01JUN12:05:32:00 1
3304 6747 01JUN12:05:32:00 1
3304 6990 01JUN12:05:32:00 2
3304 6747 01JUN12:05:39:00 3
3304 6747 01JUN12:07:28:00 4
3304 6747 01JUN12:07:28:00 4
3304 7254 01JUN12:07:28:00 5
3304 47814 01JUN12:07:28:00 6
3304 6747 01JUN12:07:29:00 7
3304 6747 01JUN12:07:29:00 7
3304 6990 01JUN12:07:29:00 8
5569 7254 01JUN12:07:29:00 1
5569 7254 01JUN12:07:30:00 1
5569 6747 01JUN12:08:15:00 2
5569 6747 01JUN12:08:15:00 2
5569 6748 01JUN12:08:16:00 3
5569 6748 01JUN12:08:16:00 3
5569 6747 01JUN12:09:14:00 4
5569 6747 01JUN12:09:14:00 4
5569 6747 01JUN12:09:14:00 4
1111 1111 01JAN13:01:00:00 1
1111 1111 01JAN13:01:20:01 2
1111 1111 01JAN13:01:21:00 2
1111 2222 01JAN13:01:41:01 3
*/
Jeffrey,
I didn't include the time check as you indicated that was something you wanted to include later but, for your initial problem, how about something like:
data want;
set have;
by member_ID website_url notsorted;
if first.member_ID then Sequence=0;
if first.member_ID or
Website_URL ne lag(Website_URL)
then Sequence+1;
run;
Using similar tools to what Art suggested:
data want;
set have;
by member_ID website_ID notsorted;
prior_time = lag(page_start_time);
if first.website_ID=0 and (calculated-comparison) then sequence=1;
run;
I'm assuming you know how to use the combination of page_start_time and prior_time to make the proper comparison to see if 20 minutes have passed.
Good luck.
I would like a DoW in a case like this.
/* test data */
data one;
input member site start :anydtdtm19.;
obs = _n_;
cards;
3304 6747 01JUN2012:05:32:00
3304 6747 01JUN2012:05:32:00
3304 6990 01JUN2012:05:32:00
3304 6747 01JUN2012:05:39:00
3304 6747 01JUN2012:07:28:00
3304 6747 01JUN2012:07:28:00
3304 7254 01JUN2012:07:28:00
3304 47814 01JUN2012:07:28:00
3304 6747 01JUN2012:07:29:00
3304 6747 01JUN2012:07:29:00
3304 6990 01JUN2012:07:29:00
5569 7254 01JUN2012:07:29:00
5569 7254 01JUN2012:07:30:00
5569 6747 01JUN2012:08:15:00
5569 6747 01JUN2012:08:15:00
5569 6748 01JUN2012:08:16:00
5569 6748 01JUN2012:08:16:00
5569 6747 01JUN2012:09:14:00
5569 6747 01JUN2012:09:14:00
5569 6747 01JUN2012:09:14:00
1111 1111 01JAN2013:01:00:00 <- new data
1111 1111 01JAN2013:01:20:01
1111 1111 01JAN2013:01:21:00
1111 2222 01JAN2013:01:41:01
;
run;
/* create sequence */
%let timeout = %sysevalf(20*60); /* 20 minutes */
data two;
seq = 0;
/* DoW */
do until (last.member);
set one;
by member site notsorted;
drop lagSite difStart;
lagSite = lag(site);
difStart = dif(start);
seq + (first.site | (lagSite=site & difStart>&timeout.));
output;
end;
run;
proc print data=two noobs;
var member site start seq;
format start datetime.;
run;
/* on lst
member site start seq
3304 6747 01JUN12:05:32:00 1
3304 6747 01JUN12:05:32:00 1
3304 6990 01JUN12:05:32:00 2
3304 6747 01JUN12:05:39:00 3
3304 6747 01JUN12:07:28:00 4
3304 6747 01JUN12:07:28:00 4
3304 7254 01JUN12:07:28:00 5
3304 47814 01JUN12:07:28:00 6
3304 6747 01JUN12:07:29:00 7
3304 6747 01JUN12:07:29:00 7
3304 6990 01JUN12:07:29:00 8
5569 7254 01JUN12:07:29:00 1
5569 7254 01JUN12:07:30:00 1
5569 6747 01JUN12:08:15:00 2
5569 6747 01JUN12:08:15:00 2
5569 6748 01JUN12:08:16:00 3
5569 6748 01JUN12:08:16:00 3
5569 6747 01JUN12:09:14:00 4
5569 6747 01JUN12:09:14:00 4
5569 6747 01JUN12:09:14:00 4
1111 1111 01JAN13:01:00:00 1
1111 1111 01JAN13:01:20:01 2
1111 1111 01JAN13:01:21:00 2
1111 2222 01JAN13:01:41:01 3
*/
A hash variety:
data one;
input member site start :anydtdtm19.;
obs = _n_;
cards;
3304 6747 01JUN2012:05:32:00
3304 6747 01JUN2012:05:32:00
3304 6990 01JUN2012:05:32:00
3304 6747 01JUN2012:05:39:00
3304 6747 01JUN2012:07:28:00
3304 6747 01JUN2012:07:28:00
3304 7254 01JUN2012:07:28:00
3304 47814 01JUN2012:07:28:00
3304 6747 01JUN2012:07:29:00
3304 6747 01JUN2012:07:29:00
3304 6990 01JUN2012:07:29:00
5569 7254 01JUN2012:07:29:00
5569 7254 01JUN2012:07:30:00
5569 6747 01JUN2012:08:15:00
5569 6747 01JUN2012:08:15:00
5569 6748 01JUN2012:08:16:00
5569 6748 01JUN2012:08:16:00
5569 6747 01JUN2012:09:14:00
5569 6747 01JUN2012:09:14:00
5569 6747 01JUN2012:09:14:00
;
run;
data _null_;
if(1=2) then set one;
length sequence 8;
declare hash seq(ordered:"a");
rc1=seq.defineKey("member", "site", "start");
rc2=seq.defineData("sequence");
rc3=seq.defineDone();
declare hash want(ordered:"a");
rc1=want.defineKey("obs","member", "site", "start");
rc2=want.defineData("obs","member", "site", "start","sequence");
rc3=want.defineDone();
sequence=0;
do until(done1);
set one end=done1;
by member site notsorted;
rc1=seq.add();
x=first.site;
if first.member then sequence=0;
sequence=sequence+((rc1=0)*x);
rc2=seq.replace();
end;
call missing(of _all_);
do until(done2);
set one end=done2;
seq.find();
want.add();
end;
want.output(dataset:"sequenced");
stop;
run;
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.