DATA Step, Macro, Functions and more

Sequencing Website Data

Accepted Solution Solved
Reply
Contributor
Posts: 62
Accepted Solution

Sequencing Website Data

Hi All,

     I'm working with website tracking data. and my dataset looks like the following:

Member IDWebsite IDWebsite URLPage View Start Time
33046747www.google.com01JUN2012:05:32:00
33046747www.google.com01JUN2012:05:32:00
33046990en.wikipedia.org01JUN2012:05:32:00
33046747www.google.com01JUN2012:05:39:00
33046747www.google.com01JUN2012:07:28:00
33046747www.google.com01JUN2012:07:28:00
33047254slickdeals.net01JUN2012:07:28:00
330447814www.wpxi.com01JUN2012:07:28:00
33046747www.google.com01JUN2012:07:29:00
33046747www.google.com01JUN2012:07:29:00
33046990en.wikipedia.org01JUN2012:07:29:00
55697254slickdeals.net01JUN2012:07:29:00
55697254slickdeals.net01JUN2012:07:30:00
55696747www.google.com01JUN2012:08:15:00
55696747www.google.com01JUN2012:08:15:00
55696748www.facebook.com01JUN2012:08:16:00
55696748www.facebook.com01JUN2012:08:16:00
55696747www.google.com01JUN2012:09:14:00
55696747www.google.com01JUN2012:09:14:00
55696747www.google.com01JUN2012:09:14:00

I want a data set that looks like this:

Member IDWebsite IDWebsite URLPage View Start TimeSequence
33046747www.google.com01JUN2012:05:32:001
33046747www.google.com01JUN2012:05:32:001
33046990en.wikipedia.org01JUN2012:05:32:002
33046747www.google.com01JUN2012:05:39:003
33046747www.google.com01JUN2012:07:28:003
33046747www.google.com01JUN2012:07:28:003
33047254slickdeals.net01JUN2012:07:28:004
330447814www.wpxi.com01JUN2012:07:28:005
33046747www.google.com01JUN2012:07:29:006
33046747www.google.com01JUN2012:07:29:006
33046990en.wikipedia.org01JUN2012:07:29:007
55697254slickdeals.net01JUN2012:07:29:001
55697254slickdeals.net01JUN2012:07:30:001
55696747www.google.com01JUN2012:08:15:002
55696747www.google.com01JUN2012:08:15:002
55696748www.facebook.com01JUN2012:08:16:003
55696748www.facebook.com01JUN2012:08:16:003
55696747www.google.com01JUN2012:09:14:004
55696747www.google.com01JUN2012:09:14:004
55696747www.google.com01JUN2012:09:14:004

Where sequence is a counter that resets for every member_id website_id page_view_start_time combination, but does not get reset if the start time changes within the same website.

In the future i would like to build in a a control that would allow me to say if you spent more than 20 minutes idle on google and then clicked on google again then that would reset the sequence counter back to 1.

This is a good one! I've been messing around with first. last. processing all morning but can't quite get the results i want.

Thank you all in advance.


Accepted Solutions
Solution
‎08-15-2012 05:34 PM
Regular Contributor
Posts: 241

Re: Sequencing Website Data

I would like a DoW in a case like this.

  /* test data */
  data one;
    input member site start :anydtdtm19.;
    obs = _n_;
  cards;
  3304 6747  01JUN2012:05:32:00
  3304 6747  01JUN2012:05:32:00
  3304 6990  01JUN2012:05:32:00
  3304 6747  01JUN2012:05:39:00
  3304 6747  01JUN2012:07:28:00
  3304 6747  01JUN2012:07:28:00
  3304 7254  01JUN2012:07:28:00
  3304 47814 01JUN2012:07:28:00
  3304 6747  01JUN2012:07:29:00
  3304 6747  01JUN2012:07:29:00
  3304 6990  01JUN2012:07:29:00
  5569 7254  01JUN2012:07:29:00
  5569 7254  01JUN2012:07:30:00
  5569 6747  01JUN2012:08:15:00
  5569 6747  01JUN2012:08:15:00
  5569 6748  01JUN2012:08:16:00
  5569 6748  01JUN2012:08:16:00
  5569 6747  01JUN2012:09:14:00
  5569 6747  01JUN2012:09:14:00
  5569 6747  01JUN2012:09:14:00
  1111 1111  01JAN2013:01:00:00  <- new data
  1111 1111  01JAN2013:01:20:01
  1111 1111  01JAN2013:01:21:00
  1111 2222  01JAN2013:01:41:01
  ;
  run;
 
  /* create sequence */
  %let timeout = %sysevalf(20*60); /* 20 minutes */
  data two;
    seq = 0;
    /* DoW */
    do until (last.member);
      set one;
      by member site notsorted;
      drop lagSite difStart;
      lagSite = lag(site);
      difStart = dif(start);
      seq + (first.site | (lagSite=site & difStart>&timeout.));
      output;
    end;
  run;
 
  proc print data=two noobs;
    var member site start seq;
    format start datetime.;
  run;
  /* on lst
  member    site        start          seq
  3304      6747    01JUN12:05:32:00    1
  3304      6747    01JUN12:05:32:00    1
  3304      6990    01JUN12:05:32:00    2
  3304      6747    01JUN12:05:39:00    3
  3304      6747    01JUN12:07:28:00    4
  3304      6747    01JUN12:07:28:00    4
  3304      7254    01JUN12:07:28:00    5
  3304    47814    01JUN12:07:28:00    6
  3304      6747    01JUN12:07:29:00    7
  3304      6747    01JUN12:07:29:00    7
  3304      6990    01JUN12:07:29:00    8
  5569      7254    01JUN12:07:29:00    1
  5569      7254    01JUN12:07:30:00    1
  5569      6747    01JUN12:08:15:00    2
  5569      6747    01JUN12:08:15:00    2
  5569      6748    01JUN12:08:16:00    3
  5569      6748    01JUN12:08:16:00    3
  5569      6747    01JUN12:09:14:00    4
  5569      6747    01JUN12:09:14:00    4
  5569      6747    01JUN12:09:14:00    4
  1111      1111    01JAN13:01:00:00    1
  1111      1111    01JAN13:01:20:01    2
  1111      1111    01JAN13:01:21:00    2
  1111      2222    01JAN13:01:41:01    3
  */

View solution in original post


All Replies
PROC Star
Posts: 7,363

Re: Sequencing Website Data

Jeffrey,

I didn't include the time check as you indicated that was something you wanted to include later but, for your initial problem, how about something like:

data want;

  set have;

  by member_ID website_url notsorted;

  if first.member_ID then Sequence=0;

  if first.member_ID or

    Website_URL ne lag(Website_URL)

    then Sequence+1;

run;

Super User
Posts: 5,081

Re: Sequencing Website Data

Using similar tools to what Art suggested:

data want;

   set have;

   by member_ID website_ID notsorted;

   prior_time = lag(page_start_time);

   if first.website_ID=0 and (calculated-comparison) then sequence=1;

run;

I'm assuming  you know how to use the combination of page_start_time and prior_time to make the proper comparison to see if 20 minutes have passed.

Good luck.

Solution
‎08-15-2012 05:34 PM
Regular Contributor
Posts: 241

Re: Sequencing Website Data

I would like a DoW in a case like this.

  /* test data */
  data one;
    input member site start :anydtdtm19.;
    obs = _n_;
  cards;
  3304 6747  01JUN2012:05:32:00
  3304 6747  01JUN2012:05:32:00
  3304 6990  01JUN2012:05:32:00
  3304 6747  01JUN2012:05:39:00
  3304 6747  01JUN2012:07:28:00
  3304 6747  01JUN2012:07:28:00
  3304 7254  01JUN2012:07:28:00
  3304 47814 01JUN2012:07:28:00
  3304 6747  01JUN2012:07:29:00
  3304 6747  01JUN2012:07:29:00
  3304 6990  01JUN2012:07:29:00
  5569 7254  01JUN2012:07:29:00
  5569 7254  01JUN2012:07:30:00
  5569 6747  01JUN2012:08:15:00
  5569 6747  01JUN2012:08:15:00
  5569 6748  01JUN2012:08:16:00
  5569 6748  01JUN2012:08:16:00
  5569 6747  01JUN2012:09:14:00
  5569 6747  01JUN2012:09:14:00
  5569 6747  01JUN2012:09:14:00
  1111 1111  01JAN2013:01:00:00  <- new data
  1111 1111  01JAN2013:01:20:01
  1111 1111  01JAN2013:01:21:00
  1111 2222  01JAN2013:01:41:01
  ;
  run;
 
  /* create sequence */
  %let timeout = %sysevalf(20*60); /* 20 minutes */
  data two;
    seq = 0;
    /* DoW */
    do until (last.member);
      set one;
      by member site notsorted;
      drop lagSite difStart;
      lagSite = lag(site);
      difStart = dif(start);
      seq + (first.site | (lagSite=site & difStart>&timeout.));
      output;
    end;
  run;
 
  proc print data=two noobs;
    var member site start seq;
    format start datetime.;
  run;
  /* on lst
  member    site        start          seq
  3304      6747    01JUN12:05:32:00    1
  3304      6747    01JUN12:05:32:00    1
  3304      6990    01JUN12:05:32:00    2
  3304      6747    01JUN12:05:39:00    3
  3304      6747    01JUN12:07:28:00    4
  3304      6747    01JUN12:07:28:00    4
  3304      7254    01JUN12:07:28:00    5
  3304    47814    01JUN12:07:28:00    6
  3304      6747    01JUN12:07:29:00    7
  3304      6747    01JUN12:07:29:00    7
  3304      6990    01JUN12:07:29:00    8
  5569      7254    01JUN12:07:29:00    1
  5569      7254    01JUN12:07:30:00    1
  5569      6747    01JUN12:08:15:00    2
  5569      6747    01JUN12:08:15:00    2
  5569      6748    01JUN12:08:16:00    3
  5569      6748    01JUN12:08:16:00    3
  5569      6747    01JUN12:09:14:00    4
  5569      6747    01JUN12:09:14:00    4
  5569      6747    01JUN12:09:14:00    4
  1111      1111    01JAN13:01:00:00    1
  1111      1111    01JAN13:01:20:01    2
  1111      1111    01JAN13:01:21:00    2
  1111      2222    01JAN13:01:41:01    3
  */

Contributor
Posts: 45

Re: Sequencing Website Data

A hash variety:

data one;

    input member site start :anydtdtm19.;

    obs = _n_;

  cards;

  3304 6747  01JUN2012:05:32:00

  3304 6747  01JUN2012:05:32:00

  3304 6990  01JUN2012:05:32:00

  3304 6747  01JUN2012:05:39:00

  3304 6747  01JUN2012:07:28:00

  3304 6747  01JUN2012:07:28:00

  3304 7254  01JUN2012:07:28:00

  3304 47814 01JUN2012:07:28:00

  3304 6747  01JUN2012:07:29:00

  3304 6747  01JUN2012:07:29:00

  3304 6990  01JUN2012:07:29:00

  5569 7254  01JUN2012:07:29:00

  5569 7254  01JUN2012:07:30:00

  5569 6747  01JUN2012:08:15:00

  5569 6747  01JUN2012:08:15:00

  5569 6748  01JUN2012:08:16:00

  5569 6748  01JUN2012:08:16:00

  5569 6747  01JUN2012:09:14:00

  5569 6747  01JUN2012:09:14:00

  5569 6747  01JUN2012:09:14:00

  ;

  run;

  data _null_;

       if(1=2) then set one;

       length sequence 8;

       declare hash seq(ordered:"a");

           rc1=seq.defineKey("member", "site", "start");

                   rc2=seq.defineData("sequence");

           rc3=seq.defineDone();

       declare hash want(ordered:"a");

           rc1=want.defineKey("obs","member", "site", "start");

           rc2=want.defineData("obs","member", "site", "start","sequence");

           rc3=want.defineDone();

       sequence=0;

       do until(done1);

            set one end=done1;

            by member site notsorted;

       rc1=seq.add();

       x=first.site;

            if first.member then sequence=0;

       sequence=sequence+((rc1=0)*x);

       rc2=seq.replace();

       end;

       call missing(of _all_);

       do until(done2);

       set one end=done2;

       seq.find();

       want.add();

       end;

       want.output(dataset:"sequenced");

  stop;

  run;

☑ This topic is SOLVED.

Need further help from the community? Please ask a new question.

Discussion stats
  • 4 replies
  • 209 views
  • 6 likes
  • 5 in conversation