Problem:
the client has a subscription of various magazines.
I need to find a common periods of subscriptions.
Example:
data subscriptions;
infile datalines;
input @01 id 1.
@03 magazine $4.
@08 date_from date9.
@18 date_to date9.;
datalines;
1 mag1 01jan2014 01mar2014
1 mag2 01feb2014 01apr2014
1 mag3 01feb2014 01apr2014
1 mag4 01mar2014 01may2014
2 mag1 01jan2014 01jul2014
2 mag2 01feb2014 01apr2014
;
I need to get the result:
ID magazine date_from date_to
---------------------------------------
1 mag1 01jan2014 01feb2014
1 mag1 01feb2014 01mar2014
1 mag2 01feb2014 01mar2014
1 mag3 01feb2014 01mar2014
1 mag2 01mar2014 01apr2014
1 mag3 01mar2014 01apr2014
1 mag4 01mar2014 01apr2014
1 mag4 01apr2014 01may2014
2 mag1 01jan2014 01feb2014
2 mag1 01feb2014 01apr2014
2 mag2 01feb2014 01apr2014
2 mag1 01apr2014 01jul2014
So first generate a record for every date that the individual started or stopped any mag.
You can then use those dates to break each subscription into intervals.
Then you can sort by the new intervals and assign a grouping variable when the individual has a new interval.
data have ;
length id $3 mag $4 start end 8 ;
informat start end date9.;
format start end yymmdd10.;
input id -- end;
cards;
1 mag1 01jan2014 01mar2014
1 mag2 01feb2014 01apr2014
1 mag3 01feb2014 01apr2014
1 mag4 01mar2014 01may2014
2 mag1 01jan2014 01jul2014
2 mag2 01feb2014 01apr2014
;;;;
proc sql noprint ;
create table anydate as
select a.id
, a.mag
, a.start as old_start
, b.date format=yymmdd10.
from have a
left join
(select distinct id,date from
(select id,start as date from have
union
select id,end as date from have)
) b
on a.id = b.id
and b.date between a.start+1 and a.end
order by 1,2,3,4
;
quit;
data want1 ;
do until (last.old_start);
set anydate ;
by id mag old_start date;
if first.old_start then start=old_start;
end = date;
output;
start=end;
end;
format start end yymmdd10.;
run;
proc sort ;
by id start end ;
run;
data want ;
set want1;
by id start end;
if first.id then grp=0;
grp+(first.start);
run;
proc print;
run;
Blimey that was an ask :smileyshocked: Well, what I would suggest is getting the same range on all data, in this instance I have just assumed the 12 months, and I would also suggest to get rid of the full date setup, and just work on months/years as that is the real aim, the 01 doesn't help. Once you have all the data across on a the same range, you can then loop across each element in this range and output the mags for that month. Then a simple sort (note, its not the nicest coding in the world, you could for instance drop the last two statements if you can stick with months).
data subscriptions;
infile datalines;
input @01 id 1.
@03 magazine $4.
@08 date_from date9.
@18 date_to date9.;
datalines;
1 mag1 01jan2014 01mar2014
1 mag2 01feb2014 01apr2014
1 mag3 01feb2014 01apr2014
1 mag4 01mar2014 01may2014
2 mag1 01jan2014 01jul2014
2 mag2 01feb2014 01apr2014
;
run;
data have;
set subscriptions;
array mnths {12} $200.;
do i=1 to 12;
if date_from <= mdy(i,1,2014) < date_to then mnths{i}=magazine;
end;
run;
proc sql;
create table RESULTS
(
ID num,
MAGAZINE char(200),
MONTH char(200)
);
quit;
data _null_;
do i=1 to 12;
call execute(%nrstr(
'proc sql;
insert into RESULTS
select ID,MAGAZINE,"'||strip(put(i,best.))||'" as MONTH
from WORK.HAVE
where MNTHS'||strip(put(i,best.))||' is not null;
quit;'));
end;
run;
data results;
set results;
attrib date_from date_to format=date9.;
select(month);
when("1") do; date_from='01JAN2014'd; date_to='01FEB2014'd; end;
when("2") do; date_from='01FEB2014'd; date_to='01MAR2014'd; end;
when("3") do; date_from='01MAR2014'd; date_to='01APR2014'd; end;
when("4") do; date_from='01APR2014'd; date_to='01MAY2014'd; end;
when("5") do; date_from='01MAY2014'd; date_to='01JUN2014'd; end;
when("6") do; date_from='01JUN2014'd; date_to='01JUL2014'd; end;
when("7") do; date_from='01JUL2014'd; date_to='01AUG2014'd; end;
when("8") do; date_from='01AUG2014'd; date_to='01SEP2014'd; end;
when("9") do; date_from='01SEP2014'd; date_to='01OCT2014'd; end;
when("10") do; date_from='01OCT2014'd; date_to='01NOV2014'd; end;
when("11") do; date_from='01NOV2014'd; date_to='01DEC2014'd; end;
when("12") do; date_from='01DEC2014'd; date_to='01JAN2015'd; end;
otherwise;
end;
run;
proc sort data=results;
by id magazine;
run;
Thanks for reply.
Unfortunately, the task is more complicated because the dates valid_from/valid_to do not have to be the first day of the month. It can be any day 😕
But you would still have subscription per monthly basis, so day is still irrelevant. You would need to slot dates to monthly sections, e.g. you may consider one of these scenarios:
Jan if 01Jan2014 - 31Jan2014, Feb if 01Feb - 29Feb etc.
Or maybe you need different windows, e.g. 20Dec2014 - 20Jan is considered the month of Jan.
Either way break it down into sections, e.g. month, or if weekly then weekly, or if you do days then days, the process I pretty similar, just modify the array.
I need outputed periods per day so DATA STEP with
select(month);
when("1")
......
isn't good for this problem. Of course I can write as many 'select when' as days but i'm affraid that there will be huge problems with performance.
For example:
data subscriptions;
infile datalines;
input @01 id 1.
@03 magazine $4.
@08 date_from date9.
@18 date_to date9.;
datalines;
1 mag1 01jan2014 15mar2014
1 mag2 20feb2014 12apr2014
1 mag3 20feb2014 12apr2014
1 mag4 25mar2014 16may2014
2 mag1 01jan2014 01jul2014
2 mag2 01feb2014 01apr2014
Result should be:
ID magazine date_from date_to
---------------------------------------
1 mag1 01jan2014 20feb2014
1 mag1 20feb2014 15mar2014
1 mag2 20feb2014 15mar2014
1 mag3 20feb2014 15mar2014
1 mag2 15mar2014 25mar2014
1 mag3 15mar2014 25mar2014
1 mag2 25mar2014 12apr2014
1 mag3 25mar2014 12apr2014
1 mag4 25mar2014 12apr2014
1 mag4 12apr2014 16may2014
2 mag1 01jan2014 01feb2014
2 mag1 01feb2014 01apr2014
2 mag2 01feb2014 01apr2014
2 mag1 01apr2014 01jul2014
Well, its not pretty, but gives a result:
data subscriptions;
infile datalines;
format date_from date_to format=date9.;
input @01 id 1.
@03 magazine $4.
@08 date_from date9.
@18 date_to date9.;
datalines;
1 mag1 01jan2014 15mar2014
1 mag2 20feb2014 12apr2014
1 mag3 20feb2014 12apr2014
1 mag4 25mar2014 16may2014
2 mag1 01jan2014 01jul2014
2 mag2 01feb2014 01apr2014
;
run;
proc sql noprint;
select min(date_from)
into :min_date
from WORK.SUBSCRIPTIONS;
select max(date_to)
into :max_date
from WORK.SUBSCRIPTIONS;
quit;
data template (keep=id days);
attrib id format=best. days format=date9.;
do j=1 to 2;
do i=&min_date. to &max_date.;
id=j; days=i; output;
end;
end;
run;
data have;
set subscriptions;
attrib days format=date9.;
do i=date_from to date_to;
days=i; output;
end;
run;
proc sql;
create table working as
select TEMP.ID,
TEMP.DAYS,
catx('/',A.MAGAZINE,B.MAGAZINE,C.MAGAZINE,D.MAGAZINE) as MAG
from TEMPLATE TEMP
left join (select * from HAVE where MAGAZINE="mag1") A
on TEMP.ID=A.ID
and TEMP.DAYS=A.DAYS
left join (select * from HAVE where MAGAZINE="mag2") B
on TEMP.ID=B.ID
and TEMP.DAYS=B.DAYS
left join (select * from HAVE where MAGAZINE="mag3") C
on TEMP.ID=C.ID
and TEMP.DAYS=C.DAYS
left join (select * from HAVE where MAGAZINE="mag4") D
on TEMP.ID=D.ID
and TEMP.DAYS=D.DAYS;
quit;
data working2 (drop=lstmag);
set working;
attrib start_date format=date9.;
by id;
retain start_date lstmag;
if first.id then do;
start_date=days;
lstmag=mag;
end;
else do;
if lstmag ne mag then do;
start_date=days;
lstmag=mag;
end;
end;
run;
proc sort data=working2 out=working3;
by descending id descending days;
run;
data working3;
set working3;
attrib end_date format=date9.;
retain end_date lstmag;
if lstmag ne mag then do;
flag="Y";
end_date=days;
lstmag=mag;
end;
run;
proc sort data=working3 out=final;
by id mag;
where flag="Y" and mag ne "";
run;
data final (keep=id magazine start_date end_date);
set final;
i=1;
do while (scan(mag,i,'/') ne "");
magazine=scan(mag,i,'/'); output;
i=i+1;
end;
run;
So first generate a record for every date that the individual started or stopped any mag.
You can then use those dates to break each subscription into intervals.
Then you can sort by the new intervals and assign a grouping variable when the individual has a new interval.
data have ;
length id $3 mag $4 start end 8 ;
informat start end date9.;
format start end yymmdd10.;
input id -- end;
cards;
1 mag1 01jan2014 01mar2014
1 mag2 01feb2014 01apr2014
1 mag3 01feb2014 01apr2014
1 mag4 01mar2014 01may2014
2 mag1 01jan2014 01jul2014
2 mag2 01feb2014 01apr2014
;;;;
proc sql noprint ;
create table anydate as
select a.id
, a.mag
, a.start as old_start
, b.date format=yymmdd10.
from have a
left join
(select distinct id,date from
(select id,start as date from have
union
select id,end as date from have)
) b
on a.id = b.id
and b.date between a.start+1 and a.end
order by 1,2,3,4
;
quit;
data want1 ;
do until (last.old_start);
set anydate ;
by id mag old_start date;
if first.old_start then start=old_start;
end = date;
output;
start=end;
end;
format start end yymmdd10.;
run;
proc sort ;
by id start end ;
run;
data want ;
set want1;
by id start end;
if first.id then grp=0;
grp+(first.start);
run;
proc print;
run;
@Tom,
It was clever
It seems that it is ok.
Thanks
How about this :
data subscriptions; infile datalines; input @01 id 1. @03 magazine $4. @08 date_from date9. @18 date_to date9.; datalines; 1 mag1 01jan2014 01mar2014 1 mag2 01feb2014 01apr2014 1 mag3 01feb2014 01apr2014 1 mag4 01mar2014 01may2014 2 mag1 01jan2014 01jul2014 2 mag2 01feb2014 01apr2014 ; run; data temp(drop= date_from date_to); set subscriptions ; do date=date_from to date_to; output; end; format date date9.; run; proc sql noprint; select distinct cat('temp(where=(id=',id,' and ',magazine,'="',magazine,'") rename=(magazine=',magazine ,'))') into : list separated by ' ' from temp; quit; data temp1(drop=mag:); merge &list ; by id date; group=catx(' ',of mag:); run; data want(drop=date); set temp1; by id group notsorted; retain date_from ; if first.group then date_from =date; if last.group then do;date_to=date;output;end; format date_from date_to date9. ; run;
Xia Keshan
Hi,
I like this solution because it's clear but I see potential performance problems.
With 500k customers and an average of 3 magazines we have to merge 1.5M tables.
I've never merged so many tables but I am affraid it will be a problem even if the tables are small.
I will try to make some tests.
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.