BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
Lucas
Calcite | Level 5

Problem:
the client has a subscription of various magazines.
I need to find a common periods of subscriptions.

Example:

data subscriptions;

   infile datalines;

   input @01 id 1.

         @03 magazine $4.

         @08 date_from date9.

         @18 date_to date9.;

   datalines;                     

1 mag1 01jan2014 01mar2014

1 mag2 01feb2014 01apr2014

1 mag3 01feb2014 01apr2014

1 mag4 01mar2014 01may2014

2 mag1 01jan2014 01jul2014

2 mag2 01feb2014 01apr2014

;

I need to get the result:

ID     magazine     date_from date_to

---------------------------------------

1     mag1         01jan2014 01feb2014

1     mag1         01feb2014 01mar2014

1     mag2         01feb2014 01mar2014

1     mag3         01feb2014 01mar2014

1     mag2         01mar2014 01apr2014

1     mag3         01mar2014 01apr2014

1     mag4         01mar2014 01apr2014

1     mag4         01apr2014 01may2014

2    mag1         01jan2014 01feb2014

2    mag1         01feb2014 01apr2014

2    mag2         01feb2014 01apr2014

2    mag1         01apr2014 01jul2014

1 ACCEPTED SOLUTION

Accepted Solutions
Tom
Super User Tom
Super User

So first generate a record for every date that the individual started or stopped any mag.

You can then use those dates to break each subscription into intervals.

Then you can sort by the new intervals and assign a grouping variable when the individual has a new interval.

data have ;

  length id $3 mag $4 start end 8 ;

  informat start end date9.;

  format start end yymmdd10.;

  input id -- end;

cards;

1 mag1 01jan2014 01mar2014

1 mag2 01feb2014 01apr2014

1 mag3 01feb2014 01apr2014

1 mag4 01mar2014 01may2014

2 mag1 01jan2014 01jul2014

2 mag2 01feb2014 01apr2014

;;;;

proc sql noprint ;

  create table anydate as

    select a.id

         , a.mag

         , a.start as old_start

         , b.date format=yymmdd10.

   from have a

    left join

        (select distinct id,date from

          (select id,start as date from have

           union

           select id,end as date from have)

        ) b

    on a.id = b.id

    and b.date between a.start+1 and a.end

    order by 1,2,3,4

;

quit;

data want1 ;

   do until (last.old_start);

    set anydate ;

    by id mag old_start date;

    if first.old_start then start=old_start;

    end = date;

    output;

    start=end;

   end;

   format start end yymmdd10.;

run;

proc sort ;

  by id start end ;

run;

data want ;

   set want1;

   by id start end;

   if first.id then grp=0;

   grp+(first.start);

run;

proc print;

run;

View solution in original post

9 REPLIES 9
RW9
Diamond | Level 26 RW9
Diamond | Level 26

Blimey that was an ask :smileyshocked:  Well, what I would suggest is getting the same range on all data, in this instance I have just assumed the 12 months, and I would also suggest to get rid of the full date setup, and just work on months/years as that is the real aim, the 01 doesn't help.  Once you have all the data across on a the same range, you can then loop across each element in this range and output the mags for that month.  Then a simple sort (note, its not the nicest coding in the world, you could for instance drop the last two statements if you can stick with months).

data subscriptions;
   infile datalines;
   input @01 id 1.
         @03 magazine $4.
         @08 date_from date9.
         @18 date_to date9.;
   datalines;                     
1 mag1 01jan2014 01mar2014
1 mag2 01feb2014 01apr2014
1 mag3 01feb2014 01apr2014
1 mag4 01mar2014 01may2014
2 mag1 01jan2014 01jul2014
2 mag2 01feb2014 01apr2014
;
run;
data have;
  set subscriptions;
  array mnths {12} $200.;
  do i=1 to 12;
    if date_from <= mdy(i,1,2014) < date_to then mnths{i}=magazine;
  end;
run;
proc sql;
  create table RESULTS
  (
    ID num,
    MAGAZINE char(200),
    MONTH char(200)
  );
quit;
data _null_;
  do i=1 to 12;
    call execute(%nrstr(
                 'proc sql;
                    insert into RESULTS
                    select ID,MAGAZINE,"'||strip(put(i,best.))||'" as MONTH
                    from WORK.HAVE
                     where MNTHS'||strip(put(i,best.))||' is not null;
                  quit;'));
  end;
run;

data results;
  set results;
  attrib date_from date_to format=date9.;
  select(month);
    when("1") do; date_from='01JAN2014'd; date_to='01FEB2014'd; end;
    when("2") do; date_from='01FEB2014'd; date_to='01MAR2014'd; end;
    when("3") do; date_from='01MAR2014'd; date_to='01APR2014'd; end;
    when("4") do; date_from='01APR2014'd; date_to='01MAY2014'd; end;
    when("5") do; date_from='01MAY2014'd; date_to='01JUN2014'd; end;
    when("6") do; date_from='01JUN2014'd; date_to='01JUL2014'd; end;
    when("7") do; date_from='01JUL2014'd; date_to='01AUG2014'd; end;
    when("8") do; date_from='01AUG2014'd; date_to='01SEP2014'd; end;
    when("9") do; date_from='01SEP2014'd; date_to='01OCT2014'd; end;
    when("10") do; date_from='01OCT2014'd; date_to='01NOV2014'd; end;
    when("11") do; date_from='01NOV2014'd; date_to='01DEC2014'd; end;
    when("12") do; date_from='01DEC2014'd; date_to='01JAN2015'd; end;
    otherwise;
  end;
run;
proc sort data=results;
  by id magazine;
run;

Lucas
Calcite | Level 5

Thanks for reply.

Unfortunately, the task is more complicated because the dates valid_from/valid_to do not have to be the first day of the month. It can be any day 😕

RW9
Diamond | Level 26 RW9
Diamond | Level 26

But you would still have subscription per monthly basis, so day is still irrelevant.  You would need to slot dates to monthly sections, e.g. you may consider one of these scenarios:

Jan if 01Jan2014 - 31Jan2014, Feb if 01Feb - 29Feb etc.

Or maybe you need different windows, e.g. 20Dec2014 - 20Jan is considered the month of Jan.

Either way break it down into sections, e.g. month, or if weekly then weekly, or if you do days then days, the process I pretty similar, just modify the array.

Lucas
Calcite | Level 5

I need outputed periods per day so DATA STEP with

select(month);

    when("1")

     ......

isn't good for this problem. Of course I can write as many 'select when' as days but i'm affraid that there will be huge problems with performance.  

For example:

data subscriptions;

   infile datalines;

   input @01 id 1.

         @03 magazine $4.

         @08 date_from date9.

         @18 date_to date9.;

   datalines;                     

1 mag1 01jan2014 15mar2014

1 mag2 20feb2014 12apr2014

1 mag3 20feb2014 12apr2014

1 mag4 25mar2014 16may2014

2 mag1 01jan2014 01jul2014

2 mag2 01feb2014 01apr2014

Result should be:

ID     magazine     date_from date_to

---------------------------------------

1     mag1         01jan2014 20feb2014

1     mag1         20feb2014 15mar2014

1     mag2         20feb2014 15mar2014

1     mag3         20feb2014 15mar2014

1     mag2         15mar2014 25mar2014

1     mag3         15mar2014 25mar2014

1     mag2         25mar2014 12apr2014

1     mag3         25mar2014 12apr2014

1     mag4         25mar2014 12apr2014

1     mag4         12apr2014 16may2014

2    mag1         01jan2014 01feb2014

2    mag1         01feb2014 01apr2014

2    mag2         01feb2014 01apr2014

2    mag1         01apr2014 01jul2014

RW9
Diamond | Level 26 RW9
Diamond | Level 26

Well, its not pretty, but gives a result:

data subscriptions;
   infile datalines;
   format date_from date_to format=date9.;
   input @01 id 1.
         @03 magazine $4.
         @08 date_from date9.
         @18 date_to date9.;
   datalines;                     
1 mag1 01jan2014 15mar2014
1 mag2 20feb2014 12apr2014
1 mag3 20feb2014 12apr2014
1 mag4 25mar2014 16may2014
2 mag1 01jan2014 01jul2014
2 mag2 01feb2014 01apr2014
;
run;

proc sql noprint;
  select  min(date_from)
  into    :min_date
  from    WORK.SUBSCRIPTIONS;
  select  max(date_to)
  into    :max_date
  from    WORK.SUBSCRIPTIONS;
quit;

data template (keep=id days);
  attrib id format=best. days format=date9.;
  do j=1 to 2;
    do i=&min_date. to &max_date.;
      id=j; days=i; output;
    end;
  end;
run;

data have;
  set subscriptions;
  attrib days format=date9.;
  do i=date_from to date_to;
    days=i; output;
  end;
run;

proc sql;
  create table working as
  select  TEMP.ID,
          TEMP.DAYS,
          catx('/',A.MAGAZINE,B.MAGAZINE,C.MAGAZINE,D.MAGAZINE) as MAG
  from    TEMPLATE TEMP
  left join (select * from HAVE where MAGAZINE="mag1") A
  on      TEMP.ID=A.ID
  and     TEMP.DAYS=A.DAYS
  left join (select * from HAVE where MAGAZINE="mag2") B
  on      TEMP.ID=B.ID
  and     TEMP.DAYS=B.DAYS
  left join (select * from HAVE where MAGAZINE="mag3") C
  on      TEMP.ID=C.ID
  and     TEMP.DAYS=C.DAYS
  left join (select * from HAVE where MAGAZINE="mag4") D
  on      TEMP.ID=D.ID
  and     TEMP.DAYS=D.DAYS;
quit;

data working2 (drop=lstmag);
  set working;
  attrib start_date format=date9.;
  by id;
  retain start_date lstmag;
  if first.id then do;
    start_date=days;
    lstmag=mag;
  end;
  else do;
    if lstmag ne mag then do;
      start_date=days;
      lstmag=mag;
    end;
  end;
run;

proc sort data=working2 out=working3;
  by descending id descending days;
run;

data working3;
  set working3;
  attrib end_date format=date9.;
  retain end_date lstmag;
  if lstmag ne mag then do;
    flag="Y";
    end_date=days;
    lstmag=mag;
  end;
run;

proc sort data=working3 out=final;
  by id mag;
  where flag="Y" and mag ne "";
run;

data final (keep=id magazine start_date end_date);
  set final;
  i=1;
  do while (scan(mag,i,'/') ne "");
    magazine=scan(mag,i,'/'); output;
    i=i+1;
  end;
run;

Tom
Super User Tom
Super User

So first generate a record for every date that the individual started or stopped any mag.

You can then use those dates to break each subscription into intervals.

Then you can sort by the new intervals and assign a grouping variable when the individual has a new interval.

data have ;

  length id $3 mag $4 start end 8 ;

  informat start end date9.;

  format start end yymmdd10.;

  input id -- end;

cards;

1 mag1 01jan2014 01mar2014

1 mag2 01feb2014 01apr2014

1 mag3 01feb2014 01apr2014

1 mag4 01mar2014 01may2014

2 mag1 01jan2014 01jul2014

2 mag2 01feb2014 01apr2014

;;;;

proc sql noprint ;

  create table anydate as

    select a.id

         , a.mag

         , a.start as old_start

         , b.date format=yymmdd10.

   from have a

    left join

        (select distinct id,date from

          (select id,start as date from have

           union

           select id,end as date from have)

        ) b

    on a.id = b.id

    and b.date between a.start+1 and a.end

    order by 1,2,3,4

;

quit;

data want1 ;

   do until (last.old_start);

    set anydate ;

    by id mag old_start date;

    if first.old_start then start=old_start;

    end = date;

    output;

    start=end;

   end;

   format start end yymmdd10.;

run;

proc sort ;

  by id start end ;

run;

data want ;

   set want1;

   by id start end;

   if first.id then grp=0;

   grp+(first.start);

run;

proc print;

run;

Lucas
Calcite | Level 5

@Tom,

It was clever Smiley Happy

It seems that it is ok.

Thanks

Ksharp
Super User

How about this :

data subscriptions;
   infile datalines;
   input @01 id 1.
         @03 magazine $4.
         @08 date_from date9.
         @18 date_to date9.;
   datalines;                     
1 mag1 01jan2014 01mar2014
1 mag2 01feb2014 01apr2014
1 mag3 01feb2014 01apr2014
1 mag4 01mar2014 01may2014
2 mag1 01jan2014 01jul2014
2 mag2 01feb2014 01apr2014
;
run;
data temp(drop= date_from  date_to);
 set subscriptions ;
 do date=date_from to date_to;
  output;
 end;
 format date date9.;
run;
proc sql noprint;
 select distinct cat('temp(where=(id=',id,' and ',magazine,'="',magazine,'") rename=(magazine=',magazine ,'))') into : list separated by ' '
  from temp;
quit;
data temp1(drop=mag:);
 merge  &list ;
 by id date;
 group=catx(' ',of mag:);
run;
data want(drop=date);
 set temp1;
 by id group notsorted;
 retain date_from ;
 if first.group then date_from =date;
 if last.group then do;date_to=date;output;end;
 format  date_from      date_to  date9. ;
run;

Xia Keshan

Lucas
Calcite | Level 5

Hi,

I like this solution because it's clear but I see potential performance problems.

With 500k customers and an average of 3 magazines we have to merge 1.5M tables.

I've never merged so many tables but I am affraid it will be a problem even if the tables are small.

I will try to make some tests.

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 9 replies
  • 1222 views
  • 3 likes
  • 4 in conversation