BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
SAShole
Pyrite | Level 9

Hi All,

     I'm working with website tracking data. and my dataset looks like the following:

Member IDWebsite IDWebsite URLPage View Start Time
33046747www.google.com01JUN2012:05:32:00
33046747www.google.com01JUN2012:05:32:00
33046990en.wikipedia.org01JUN2012:05:32:00
33046747www.google.com01JUN2012:05:39:00
33046747www.google.com01JUN2012:07:28:00
33046747www.google.com01JUN2012:07:28:00
33047254slickdeals.net01JUN2012:07:28:00
330447814www.wpxi.com01JUN2012:07:28:00
33046747www.google.com01JUN2012:07:29:00
33046747www.google.com01JUN2012:07:29:00
33046990en.wikipedia.org01JUN2012:07:29:00
55697254slickdeals.net01JUN2012:07:29:00
55697254slickdeals.net01JUN2012:07:30:00
55696747www.google.com01JUN2012:08:15:00
55696747www.google.com01JUN2012:08:15:00
55696748www.facebook.com01JUN2012:08:16:00
55696748www.facebook.com01JUN2012:08:16:00
55696747www.google.com01JUN2012:09:14:00
55696747www.google.com01JUN2012:09:14:00
55696747www.google.com01JUN2012:09:14:00

I want a data set that looks like this:

Member IDWebsite IDWebsite URLPage View Start TimeSequence
33046747www.google.com01JUN2012:05:32:001
33046747www.google.com01JUN2012:05:32:001
33046990en.wikipedia.org01JUN2012:05:32:002
33046747www.google.com01JUN2012:05:39:003
33046747www.google.com01JUN2012:07:28:003
33046747www.google.com01JUN2012:07:28:003
33047254slickdeals.net01JUN2012:07:28:004
330447814www.wpxi.com01JUN2012:07:28:005
33046747www.google.com01JUN2012:07:29:006
33046747www.google.com01JUN2012:07:29:006
33046990en.wikipedia.org01JUN2012:07:29:007
55697254slickdeals.net01JUN2012:07:29:001
55697254slickdeals.net01JUN2012:07:30:001
55696747www.google.com01JUN2012:08:15:002
55696747www.google.com01JUN2012:08:15:002
55696748www.facebook.com01JUN2012:08:16:003
55696748www.facebook.com01JUN2012:08:16:003
55696747www.google.com01JUN2012:09:14:004
55696747www.google.com01JUN2012:09:14:004
55696747www.google.com01JUN2012:09:14:004

Where sequence is a counter that resets for every member_id website_id page_view_start_time combination, but does not get reset if the start time changes within the same website.

In the future i would like to build in a a control that would allow me to say if you spent more than 20 minutes idle on google and then clicked on google again then that would reset the sequence counter back to 1.

This is a good one! I've been messing around with first. last. processing all morning but can't quite get the results i want.

Thank you all in advance.

1 ACCEPTED SOLUTION

Accepted Solutions
chang_y_chung_hotmail_com
Obsidian | Level 7

I would like a DoW in a case like this.

  /* test data */
  data one;
    input member site start :anydtdtm19.;
    obs = _n_;
  cards;
  3304 6747  01JUN2012:05:32:00
  3304 6747  01JUN2012:05:32:00
  3304 6990  01JUN2012:05:32:00
  3304 6747  01JUN2012:05:39:00
  3304 6747  01JUN2012:07:28:00
  3304 6747  01JUN2012:07:28:00
  3304 7254  01JUN2012:07:28:00
  3304 47814 01JUN2012:07:28:00
  3304 6747  01JUN2012:07:29:00
  3304 6747  01JUN2012:07:29:00
  3304 6990  01JUN2012:07:29:00
  5569 7254  01JUN2012:07:29:00
  5569 7254  01JUN2012:07:30:00
  5569 6747  01JUN2012:08:15:00
  5569 6747  01JUN2012:08:15:00
  5569 6748  01JUN2012:08:16:00
  5569 6748  01JUN2012:08:16:00
  5569 6747  01JUN2012:09:14:00
  5569 6747  01JUN2012:09:14:00
  5569 6747  01JUN2012:09:14:00
  1111 1111  01JAN2013:01:00:00  <- new data
  1111 1111  01JAN2013:01:20:01
  1111 1111  01JAN2013:01:21:00
  1111 2222  01JAN2013:01:41:01
  ;
  run;
 
  /* create sequence */
  %let timeout = %sysevalf(20*60); /* 20 minutes */
  data two;
    seq = 0;
    /* DoW */
    do until (last.member);
      set one;
      by member site notsorted;
      drop lagSite difStart;
      lagSite = lag(site);
      difStart = dif(start);
      seq + (first.site | (lagSite=site & difStart>&timeout.));
      output;
    end;
  run;
 
  proc print data=two noobs;
    var member site start seq;
    format start datetime.;
  run;
  /* on lst
  member    site        start          seq
  3304      6747    01JUN12:05:32:00    1
  3304      6747    01JUN12:05:32:00    1
  3304      6990    01JUN12:05:32:00    2
  3304      6747    01JUN12:05:39:00    3
  3304      6747    01JUN12:07:28:00    4
  3304      6747    01JUN12:07:28:00    4
  3304      7254    01JUN12:07:28:00    5
  3304    47814    01JUN12:07:28:00    6
  3304      6747    01JUN12:07:29:00    7
  3304      6747    01JUN12:07:29:00    7
  3304      6990    01JUN12:07:29:00    8
  5569      7254    01JUN12:07:29:00    1
  5569      7254    01JUN12:07:30:00    1
  5569      6747    01JUN12:08:15:00    2
  5569      6747    01JUN12:08:15:00    2
  5569      6748    01JUN12:08:16:00    3
  5569      6748    01JUN12:08:16:00    3
  5569      6747    01JUN12:09:14:00    4
  5569      6747    01JUN12:09:14:00    4
  5569      6747    01JUN12:09:14:00    4
  1111      1111    01JAN13:01:00:00    1
  1111      1111    01JAN13:01:20:01    2
  1111      1111    01JAN13:01:21:00    2
  1111      2222    01JAN13:01:41:01    3
  */

View solution in original post

4 REPLIES 4
art297
Opal | Level 21

Jeffrey,

I didn't include the time check as you indicated that was something you wanted to include later but, for your initial problem, how about something like:

data want;

  set have;

  by member_ID website_url notsorted;

  if first.member_ID then Sequence=0;

  if first.member_ID or

    Website_URL ne lag(Website_URL)

    then Sequence+1;

run;

Astounding
PROC Star

Using similar tools to what Art suggested:

data want;

   set have;

   by member_ID website_ID notsorted;

   prior_time = lag(page_start_time);

   if first.website_ID=0 and (calculated-comparison) then sequence=1;

run;

I'm assuming  you know how to use the combination of page_start_time and prior_time to make the proper comparison to see if 20 minutes have passed.

Good luck.

chang_y_chung_hotmail_com
Obsidian | Level 7

I would like a DoW in a case like this.

  /* test data */
  data one;
    input member site start :anydtdtm19.;
    obs = _n_;
  cards;
  3304 6747  01JUN2012:05:32:00
  3304 6747  01JUN2012:05:32:00
  3304 6990  01JUN2012:05:32:00
  3304 6747  01JUN2012:05:39:00
  3304 6747  01JUN2012:07:28:00
  3304 6747  01JUN2012:07:28:00
  3304 7254  01JUN2012:07:28:00
  3304 47814 01JUN2012:07:28:00
  3304 6747  01JUN2012:07:29:00
  3304 6747  01JUN2012:07:29:00
  3304 6990  01JUN2012:07:29:00
  5569 7254  01JUN2012:07:29:00
  5569 7254  01JUN2012:07:30:00
  5569 6747  01JUN2012:08:15:00
  5569 6747  01JUN2012:08:15:00
  5569 6748  01JUN2012:08:16:00
  5569 6748  01JUN2012:08:16:00
  5569 6747  01JUN2012:09:14:00
  5569 6747  01JUN2012:09:14:00
  5569 6747  01JUN2012:09:14:00
  1111 1111  01JAN2013:01:00:00  <- new data
  1111 1111  01JAN2013:01:20:01
  1111 1111  01JAN2013:01:21:00
  1111 2222  01JAN2013:01:41:01
  ;
  run;
 
  /* create sequence */
  %let timeout = %sysevalf(20*60); /* 20 minutes */
  data two;
    seq = 0;
    /* DoW */
    do until (last.member);
      set one;
      by member site notsorted;
      drop lagSite difStart;
      lagSite = lag(site);
      difStart = dif(start);
      seq + (first.site | (lagSite=site & difStart>&timeout.));
      output;
    end;
  run;
 
  proc print data=two noobs;
    var member site start seq;
    format start datetime.;
  run;
  /* on lst
  member    site        start          seq
  3304      6747    01JUN12:05:32:00    1
  3304      6747    01JUN12:05:32:00    1
  3304      6990    01JUN12:05:32:00    2
  3304      6747    01JUN12:05:39:00    3
  3304      6747    01JUN12:07:28:00    4
  3304      6747    01JUN12:07:28:00    4
  3304      7254    01JUN12:07:28:00    5
  3304    47814    01JUN12:07:28:00    6
  3304      6747    01JUN12:07:29:00    7
  3304      6747    01JUN12:07:29:00    7
  3304      6990    01JUN12:07:29:00    8
  5569      7254    01JUN12:07:29:00    1
  5569      7254    01JUN12:07:30:00    1
  5569      6747    01JUN12:08:15:00    2
  5569      6747    01JUN12:08:15:00    2
  5569      6748    01JUN12:08:16:00    3
  5569      6748    01JUN12:08:16:00    3
  5569      6747    01JUN12:09:14:00    4
  5569      6747    01JUN12:09:14:00    4
  5569      6747    01JUN12:09:14:00    4
  1111      1111    01JAN13:01:00:00    1
  1111      1111    01JAN13:01:20:01    2
  1111      1111    01JAN13:01:21:00    2
  1111      2222    01JAN13:01:41:01    3
  */

joehinson
Calcite | Level 5

A hash variety:

data one;

    input member site start :anydtdtm19.;

    obs = _n_;

  cards;

  3304 6747  01JUN2012:05:32:00

  3304 6747  01JUN2012:05:32:00

  3304 6990  01JUN2012:05:32:00

  3304 6747  01JUN2012:05:39:00

  3304 6747  01JUN2012:07:28:00

  3304 6747  01JUN2012:07:28:00

  3304 7254  01JUN2012:07:28:00

  3304 47814 01JUN2012:07:28:00

  3304 6747  01JUN2012:07:29:00

  3304 6747  01JUN2012:07:29:00

  3304 6990  01JUN2012:07:29:00

  5569 7254  01JUN2012:07:29:00

  5569 7254  01JUN2012:07:30:00

  5569 6747  01JUN2012:08:15:00

  5569 6747  01JUN2012:08:15:00

  5569 6748  01JUN2012:08:16:00

  5569 6748  01JUN2012:08:16:00

  5569 6747  01JUN2012:09:14:00

  5569 6747  01JUN2012:09:14:00

  5569 6747  01JUN2012:09:14:00

  ;

  run;

  data _null_;

       if(1=2) then set one;

       length sequence 8;

       declare hash seq(ordered:"a");

           rc1=seq.defineKey("member", "site", "start");

                   rc2=seq.defineData("sequence");

           rc3=seq.defineDone();

       declare hash want(ordered:"a");

           rc1=want.defineKey("obs","member", "site", "start");

           rc2=want.defineData("obs","member", "site", "start","sequence");

           rc3=want.defineDone();

       sequence=0;

       do until(done1);

            set one end=done1;

            by member site notsorted;

       rc1=seq.add();

       x=first.site;

            if first.member then sequence=0;

       sequence=sequence+((rc1=0)*x);

       rc2=seq.replace();

       end;

       call missing(of _all_);

       do until(done2);

       set one end=done2;

       seq.find();

       want.add();

       end;

       want.output(dataset:"sequenced");

  stop;

  run;

hackathon24-white-horiz.png

The 2025 SAS Hackathon has begun!

It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.

Latest Updates

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 4 replies
  • 1607 views
  • 6 likes
  • 5 in conversation